[Netarchivesuite-curator] BnF NAS update for December
alexandre.chautemps at bnf.fr
alexandre.chautemps at bnf.fr
Mon Dec 14 17:06:06 CET 2020
Hi Anders,
Thank you for your answer and for the links you share with us.
We have no investigations in progress about Solr Wayback, but it could be
a very interersting project to initiate in the future.
Best regards,
Alexandre
De : "Anders Klindt Myrvoll" <ANKM at kb.dk>
A : "alexandre.chautemps at bnf.fr" <alexandre.chautemps at bnf.fr>,
"netarchivesuite-curator at ml.sbforge.org"
<netarchivesuite-curator at ml.sbforge.org>
Cc : "bert.wendland at bnf.fr" <bert.wendland at bnf.fr>, "DDL_DLN at bnf.fr"
<DDL_DLN at bnf.fr>, "leslie.bellony-ext at bnf.fr" <leslie.bellony-ext at bnf.fr>
Date : 09/12/2020 08:42
Objet : RE: [Netarchivesuite-curator] BnF NAS update for December
Hi Alexandre
Thanks for the update – really impressive broad crawl.
We are experiencing a slow broad crawl this time – and we are
investigating further.
We are still awaiting to use SolrWayback for our archive (Solrindex) –
hopefully this month.
In the meanwhile everything is available as open source here:
https://github.com/netarchivesuite/solrwayback/releases/tag/4.0.5
(prerelease)
https://github.com/netarchivesuite/solrwayback
Might be interesting for you to see the covid-19 collection through
SolrWayback once you have indexed everything . I’ll be happy to show you
some of the new features
Best,
Anders
From: Netarchivesuite-curator
<netarchivesuite-curator-bounces at ml.sbforge.org> On Behalf Of
alexandre.chautemps at bnf.fr
Sent: Tuesday, December 8, 2020 6:18 PM
To: netarchivesuite-curator at ml.sbforge.org
Cc: bert.wendland at bnf.fr; DDL_DLN at bnf.fr; leslie.bellony-ext at bnf.fr
Subject: [Netarchivesuite-curator] BnF NAS update for December
Dear all,
Our annual broad crawl has ended on 7th of November. It lasted 32 days,
executed 1037 jobs, and crawled 2,455 billions of URLs for a size of
117,59 TB (compressed).
The French newspaper Liberation contacted our team to inform us that their
blog platform (https://www.liberation.fr/blogs,26) would be closed in the
course of December. The platform hosts more than 300 blogs. We launched
an emergency crawl last week to crawl these blogs and preserve them.
We are working on the full text indexation (with Solr) of our covid-19
crawl performed between February and July of 2020 and covering the first
wave of the pandemic. The size of this collection is about 15 TB
(compressed). The new collection will be put in production during december
and will be available to the readers through the GUI Archives de
l'internet Labs.
Best regards,
The BnF digital legal deposit team
Ouverture partielle des salles de recherche
La biblioth?que tous publics (Haut de jardin) et les expositions restent
ferm?es.
Les salles de recherche?sont ouvertes aux lecteurs titulaires de pass
recherche uniquement sur r?servation et exclusivement pour la consultation
d'ouvrages r?serv?s (voir modalit?s ici) ? partir du 24 novembre, du mardi
au vendredi et de 10 h ? 17 h.
Avant d'imprimer, pensez ? l'environnement.
Ouverture partielle des salles de recherche
La bibliothèque tous publics (Haut de jardin) et les expositions restent fermées.
Les salles de recherche sont ouvertes aux lecteurs titulaires de pass recherche uniquement sur réservation et exclusivement pour la consultation d'ouvrages réservés ( voir modalités ici ) à partir du 24 novembre, du mardi au vendredi et de 10 h à 17 h. Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20201214/07d88518/attachment.html>
More information about the Netarchivesuite-curator
mailing list