[Netarchivesuite-curator] BnF NAS update for September

alexandre.chautemps at bnf.fr alexandre.chautemps at bnf.fr
Tue Sep 8 11:52:21 CEST 2020


Dear all,

After the upgrade of NAS and Heritrix in June, we have observed the 
evolution of the QA indicators by comparing similar jobs run before and 
after the upgrade. The findings are positive : for a same job type, we 
crawl more URLs with less 404 errors with the new version, and the 
improvement is particularly significant with the image files, with a 
growth of the number of crawled images between 19 % and 98 % depending on 
the different types of jobs. We are very happy with this quality 
improvement, however we have to manage with larger WARC files and to 
reassess our budget estimate. Our annual broad crawl will be launched in 
October and we have to carefully adjust the parameters in order to comply 
with budget forecast. 

The new version of BC web (7.3.0), with new functionalities such as 
duplication of records and improvement of the advanced search and of the 
deduplication, has been successfully put in production at the end of July.


Best regards,

The BnF digital legal deposit team
Réouverture progressive de la BnF à partir du 6 juillet,  retrouvez les modalités ici Avant d'imprimer, pensez à l'environnement. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20200908/0ea7f538/attachment.html>


More information about the Netarchivesuite-curator mailing list