[Netarchivesuite-curator] BnF NAS update for September

auriane.quoix at bnf.fr auriane.quoix at bnf.fr
Mon Sep 5 19:25:33 CEST 2022


Dear all,

First of all, we welcome Nola N'Diaye in our team as harvesting manager 
and assistant head of the digital legal deposit team. She succeeds Pascal 
Tanésie who is retiring in December.

Last month, nas-preload version 9.1 and NetarchiveSuite version 7.4.1 have 
been released. The new version of NAS includes several improvements and 
evolutions which will be usefull for monitoring the crawls: display of the 
compressed data size of the WARC files produced by each running job, 
distinction of the queues types on Progression and Queues page, bug fix on 
the possibility to use a regex with a backslash on Browse/Delete 
frontier...

We are also going to launch a test broad crawl this week. Our production 
crawl will be launched in October.

The crawl stemmed from the LIFRANUM project which concerns digital 
French-speaking literature websites, ended last week. 1089 seeds 
(websites, blogs hosted on several platforms such as wordpress.com, 
over-blog.com, etc...) have been harvested. We also crawled separately a 
few thousand contextual contents webpages with a dedicated job. The 
selection step was made with Hyphe, a web corpus curation tool based on a 
web crawler.

Finally the IIPC webinar "Web Archiving the War in Ukraine" took place 
last Wednesday. On this occasion our colleagues Vladimir Tybin and Anaïs 
Crinière-Boizet presented, with Kees Teszelszky, the "War in Ukraine" IIPC 
collaborative collection led by the BnF and the National Library of The 
Netherlands.

Best regards,

The BnF digital legal deposit team

Samedi 17 et dimanche 18 septembre 2022 :  la BnF fête la réouverture du site Richelieu , après douze ans de travaux de rénovation et de modernisation, avec  un parcours de visite  en compagnie d’artistes et comédiens l'après-midi, et  des événements et performances  la soirée.  Avant d'imprimer, pensez à l'environnement. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20220905/b718fe98/attachment.html>


More information about the Netarchivesuite-curator mailing list