[Netarchivesuite-curator] BnF NAS update for September

peter.stirling at bnf.fr peter.stirling at bnf.fr
Thu Sep 11 11:43:53 CEST 2014


Hello all,

We have just started to prepare our annual broad crawl. During the coming 
month, we will check the different parts of the process. First, we will 
prepare the ingest of sources in NetarchiveSuite: we have 4.4 million 
domains from registrars, complemented by 29,000 URL from BCWeb and other 
BnF databases. Then, we will configure a number of crawlers from selective 
harvests to be used for the broad crawl. This year, the engineers for web 
legal deposit will pay special attention to the environment of the 
crawlers as they had many problems last year: for example, to make sure 
that the communication with the storage racks is fine. Finally, as the 
total volume has to be limited to around 55 or 60 TB, we have decided to 
do only one step instead of two and we have to estimate which maximum 
budget we can give to each domain.

Best regards,
The BnF digital legal deposit team
Participez à l'acquisition d'un Trésor national - Le manuscrit royal de François I er Avant d'imprimer, pensez à l'environnement. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20140911/58eac431/attachment.html>


More information about the Netarchivesuite-curator mailing list