[Netarchivesuite-users] Heritrix version and deduplication
nicolas.giraud at bnf.fr
nicolas.giraud at bnf.fr
Wed Apr 29 14:55:26 CEST 2009
Hi,
Currently we are using Heritrix 1.14 in production. So the prod team would
feel more comfortable keeping the same version when we move NAS in
production. I understand that the supplied version of Heritrix is a
patched 1.12.1, with code added to handle deduplication. So the production
team has two main questions :
1) Is there a way to properly turn off deduplication? This is because we
use Wayback and deduplication information would not appear to the end
user, which the librarians are not ok with. But I believe there might be a
way to generate CDX indexes from the deduplication logs. Any insight?
2) Is there a way to replace the supplied Heritrix version with the 1.14,
maybe loosing deduplication features?
My personal opinion is that deduplication is a major feature, and I would
like to use it in production, but I would like some background information
to be able to discuss alternatives with the production team.
Cheers,
Nicolas
Avant d'imprimer, pensez à l'environnement.
Consider the environment before printing this mail.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20090429/d07ce8ec/attachment-0002.html>
More information about the NetarchiveSuite-users
mailing list