[Netarchivesuite-devel] Questions regarding WARC
sara.aubry at bnf.fr
sara.aubry at bnf.fr
Wed Apr 16 15:00:00 CEST 2014
Hello everyone,
As I mentionned yesterday, we're testing NAS with WARC extensively to
prepare our big move to WARC.
We have a few questions regarding the configuration you chose to
implement:
- are you using the default WARCArchiver from Heritrix
(org.archive.crawler.writer.WARCWriterProcessor) or the one from NAS
(dk.netarkivet.harvester.harvesting.WARCWriterProcessor) ?
- from our tests, neither one is producing revisit records for duplicates:
is that correct? Would that be complicated to change the WARCWriter from
NAS to have some?
- we would also like to have prefix in metadata files (either
BnF-1-1-metadata-1.warc or 1-1-metadata-BnF-1.warc). Would that be
easy/possible to implement?
Best,
Sara
Exposition Été 1914. Les derniers jours de l'ancien monde - du 25 mars au 3 août 2014 - BnF - François-Mitterrand Avant d'imprimer, pensez à l'environnement.
More information about the Netarchivesuite-devel
mailing list