[Netarchivesuite-devel] Questions regarding WARC

sara.aubry at bnf.fr sara.aubry at bnf.fr
Wed Apr 16 15:00:00 CEST 2014


Hello everyone,

As I mentionned yesterday, we're testing NAS with WARC extensively to 
prepare our big move to WARC.
We have a few questions regarding the configuration you chose to 
implement:
- are you using the default WARCArchiver from Heritrix 
(org.archive.crawler.writer.WARCWriterProcessor) or the one from NAS 
(dk.netarkivet.harvester.harvesting.WARCWriterProcessor) ?
- from our tests, neither one is producing revisit records for duplicates: 
is that correct? Would that be complicated to change the WARCWriter from 
NAS to have some?
- we would also like to have prefix in metadata  files (either 
BnF-1-1-metadata-1.warc or 1-1-metadata-BnF-1.warc). Would that be 
easy/possible to implement?
Best,

Sara


Exposition  Été 1914. Les derniers jours de l'ancien monde  - du 25 mars au 3 août 2014 - BnF - François-Mitterrand Avant d'imprimer, pensez à l'environnement. 


More information about the Netarchivesuite-devel mailing list