[Netarchivesuite-curator] BnF NAS update for December

alexandre.chautemps at bnf.fr alexandre.chautemps at bnf.fr
Mon Dec 14 17:06:06 CET 2020


Hi Anders,

Thank you for your answer and for the links you share with us.
We have no investigations in progress about Solr Wayback, but it could be 
a very interersting project to initiate in the future.

Best regards,

Alexandre



De :    "Anders Klindt Myrvoll" <ANKM at kb.dk>
A :     "alexandre.chautemps at bnf.fr" <alexandre.chautemps at bnf.fr>, 
"netarchivesuite-curator at ml.sbforge.org" 
<netarchivesuite-curator at ml.sbforge.org>
Cc :    "bert.wendland at bnf.fr" <bert.wendland at bnf.fr>, "DDL_DLN at bnf.fr" 
<DDL_DLN at bnf.fr>, "leslie.bellony-ext at bnf.fr" <leslie.bellony-ext at bnf.fr>
Date :  09/12/2020 08:42
Objet : RE: [Netarchivesuite-curator] BnF NAS update for December



Hi Alexandre
 
Thanks for the update – really impressive broad crawl.
 
We are experiencing a slow broad crawl this time – and we are 
investigating further. 
 
We are still awaiting to use SolrWayback for our archive (Solrindex) – 
hopefully this month.
 
In the meanwhile everything is available as open source here:
https://github.com/netarchivesuite/solrwayback/releases/tag/4.0.5 
(prerelease)
https://github.com/netarchivesuite/solrwayback
 
Might be interesting for you to see the covid-19 collection through 
SolrWayback once you have indexed everything . I’ll be happy to show you 
some of the new features
 
Best,
Anders
 
From: Netarchivesuite-curator 
<netarchivesuite-curator-bounces at ml.sbforge.org> On Behalf Of 
alexandre.chautemps at bnf.fr
Sent: Tuesday, December 8, 2020 6:18 PM
To: netarchivesuite-curator at ml.sbforge.org
Cc: bert.wendland at bnf.fr; DDL_DLN at bnf.fr; leslie.bellony-ext at bnf.fr
Subject: [Netarchivesuite-curator] BnF NAS update for December
 
Dear all,

Our annual broad crawl has ended on 7th of November. It lasted 32 days, 
executed 1037 jobs, and crawled 2,455 billions of URLs for a size of 
117,59 TB (compressed).

The French newspaper Liberation contacted our team to inform us that their 
blog platform (https://www.liberation.fr/blogs,26) would be closed in the 
course of December.  The platform hosts more than 300 blogs. We launched 
an emergency crawl last week to crawl these blogs and preserve them.

We are working on the full text indexation (with Solr) of our covid-19 
crawl performed between February and July of 2020 and covering the first 
wave of the pandemic. The size of this collection is about 15 TB 
(compressed). The new collection will be put in production during december 
and will be available to the readers through the GUI Archives de 
l'internet Labs.

Best regards,

The BnF digital legal deposit team

Ouverture partielle des salles de recherche
La biblioth?que tous publics (Haut de jardin) et les expositions restent 
ferm?es.
Les salles de recherche?sont ouvertes aux lecteurs titulaires de pass 
recherche uniquement sur r?servation et exclusivement pour la consultation 
d'ouvrages r?serv?s (voir modalit?s ici) ? partir du 24 novembre, du mardi 
au vendredi et de 10 h ? 17 h.
Avant d'imprimer, pensez ? l'environnement.

Ouverture partielle des salles de recherche 
La bibliothèque tous publics (Haut de jardin) et les expositions restent fermées. 
Les salles de recherche sont ouvertes aux lecteurs titulaires de pass recherche uniquement sur réservation et exclusivement pour la consultation d'ouvrages réservés ( voir modalités ici ) à partir du 24 novembre, du mardi au vendredi et de 10 h à 17 h. Avant d'imprimer, pensez à l'environnement. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20201214/07d88518/attachment.html>


More information about the Netarchivesuite-curator mailing list