[Netarchivesuite-curator] BnF NAS update for September
alexandre.chautemps at bnf.fr
alexandre.chautemps at bnf.fr
Tue Sep 8 11:52:21 CEST 2020
Dear all,
After the upgrade of NAS and Heritrix in June, we have observed the
evolution of the QA indicators by comparing similar jobs run before and
after the upgrade. The findings are positive : for a same job type, we
crawl more URLs with less 404 errors with the new version, and the
improvement is particularly significant with the image files, with a
growth of the number of crawled images between 19 % and 98 % depending on
the different types of jobs. We are very happy with this quality
improvement, however we have to manage with larger WARC files and to
reassess our budget estimate. Our annual broad crawl will be launched in
October and we have to carefully adjust the parameters in order to comply
with budget forecast.
The new version of BC web (7.3.0), with new functionalities such as
duplication of records and improvement of the advanced search and of the
deduplication, has been successfully put in production at the end of July.
Best regards,
The BnF digital legal deposit team
Réouverture progressive de la BnF à partir du 6 juillet, retrouvez les modalités ici Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20200908/0ea7f538/attachment.html>
More information about the Netarchivesuite-curator
mailing list