[Netarchivesuite-curator] BnF NAS update for December

alexandre.chautemps at bnf.fr alexandre.chautemps at bnf.fr
Tue Dec 8 18:18:24 CET 2020


Dear all,

Our annual broad crawl has ended on 7th of November. It lasted 32 days, 
executed 1037 jobs, and crawled 2,455 billions of URLs for a size of 
117,59 TB (compressed).

The French newspaper Liberation contacted our team to inform us that their 
blog platform (https://www.liberation.fr/blogs,26) would be closed in the 
course of December.  The platform hosts more than 300 blogs. We launched 
an emergency crawl last week to crawl these blogs and preserve them.

We are working on the full text indexation (with Solr) of our covid-19 
crawl performed between February and July of 2020 and covering the first 
wave of the pandemic. The size of this collection is about 15 TB 
(compressed). The new collection will be put in production during december 
and will be available to the readers through the GUI Archives de 
l'internet Labs.

Best regards,

The BnF digital legal deposit team
Ouverture partielle des salles de recherche 
La bibliothèque tous publics (Haut de jardin) et les expositions restent fermées. 
Les salles de recherche sont ouvertes aux lecteurs titulaires de pass recherche uniquement sur réservation et exclusivement pour la consultation d'ouvrages réservés ( voir modalités ici ) à partir du 24 novembre, du mardi au vendredi et de 10 h à 17 h. Avant d'imprimer, pensez à l'environnement. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20201208/884d2761/attachment.html>


More information about the Netarchivesuite-curator mailing list