[Netarchivesuite-curator] BnF NAS update for September
peter.stirling at bnf.fr
peter.stirling at bnf.fr
Thu Sep 11 11:43:53 CEST 2014
Hello all,
We have just started to prepare our annual broad crawl. During the coming
month, we will check the different parts of the process. First, we will
prepare the ingest of sources in NetarchiveSuite: we have 4.4 million
domains from registrars, complemented by 29,000 URL from BCWeb and other
BnF databases. Then, we will configure a number of crawlers from selective
harvests to be used for the broad crawl. This year, the engineers for web
legal deposit will pay special attention to the environment of the
crawlers as they had many problems last year: for example, to make sure
that the communication with the storage racks is fine. Finally, as the
total volume has to be limited to around 55 or 60 TB, we have decided to
do only one step instead of two and we have to estimate which maximum
budget we can give to each domain.
Best regards,
The BnF digital legal deposit team
Participez à l'acquisition d'un Trésor national - Le manuscrit royal de François I er Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20140911/58eac431/attachment.html>
More information about the Netarchivesuite-curator
mailing list