[Netarchivesuite-curator] BnF NAS update for January

geraldine.camile at bnf.fr geraldine.camile at bnf.fr
Wed Jan 8 09:18:38 CET 2020

Dear All,

A Happy New Year and best wishes to all for 2020 from the BnF web 
archiving team!

Our broad crawl, which started on October 12th, finished on December 12th. 
It represents 2.2 billion URLs and 118.17 TB of compressed data. Despite 
technical problems related to our  infrastructure (25% of the jobs were 
killed by their HarvestController because Heritrix needed too much time to 
initialise), it took less time than last year (11 weeks in 2019). Its size 
exceeds our initial budget of 110 TB due to an average weight per URL 
which is higher than our estimates (from 55421 bytes to 57044 bytes). 
We'll analyse the reports to understand this increase: it probably comes 
from an evolution of the websites.

We welcome a newcomer in our team: Alexandre Faye. He will be in charge of 
cooperation with researchers and international cooperation.

Best regards,

The BnF digital legal deposit team

Exposition  Tolkien, voyage en Terre du Milieu  - du 22 octobre 2019 au 16 février 2020 - BnF - François-Mitterrand Avant d'imprimer, pensez à l'environnement. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20200108/c8d5bc1a/attachment.html>

More information about the Netarchivesuite-curator mailing list