[Netarchivesuite-users] Optimizing deduplication index generation
nicolas.giraud at bnf.fr
nicolas.giraud at bnf.fr
Mon Jun 22 13:54:54 CEST 2009
Hi,
During my broad harvest tests, I've noticed that generating the
deduplication index takes a very long time. Currently I've harvested about
70 Go of data, which is not very much, and generating the index for a new
broad harvest job takes about one hour. Is there a means to store the
previous indices, and only incrementally generate the delta for the jobs
that were not previously taken into account? Does index generation already
works that way, or does it start over everytime? If not why does it take
so long?
Best regards,
Nicolas
Avant d'imprimer, pensez à l'environnement.
Consider the environment before printing this mail.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20090622/e03269f3/attachment-0002.html>
More information about the NetarchiveSuite-users
mailing list