[Netarchivesuite-curator] BnF NAS update for February

peter.stirling at bnf.fr peter.stirling at bnf.fr
Fri Feb 5 11:36:54 CET 2016


Hello all,

Here is an update on part of our regular activity, the ongoing crawls. The 
seeds are chosen by librarians in the different departments of the BnF, 
and also by partner libraries in Strasbourg and in Montpellier. They cover 
websites we absolutely must have in the main fields of knowledge, and each 
department or library draws up its collection policy for these crawls as 
part of its overall collection policy and within the legal deposit 
framework of web archiving.

In 2015, these crawls contained 14,000 seed URLs, which  were harvested 
with a specific frequency (weekly, monthly, twice a year or annually) and 
depth (domain, host, path, page+2). In total, in 2015 we collected 756 
million URLs, representing 38 TB.

We will maintain these ongoing crawls in 2016, alongside several project 
crawls (World War I, social movements, international publications, 
solidarity, official publications, Olympic games).

Best regards,
The BnF digital legal deposit team

Expositions : 
Anselm Kiefer, l’alchimie du livre  - jusqu'au 7 février 2016 - BnF - François-Mitterrand Avant d'imprimer, pensez à l'environnement. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20160205/5e2790c1/attachment.html>


More information about the Netarchivesuite-curator mailing list