[Netarchivesuite-curator] (Erratum) BnF NAS update for April

auriane.quoix at bnf.fr auriane.quoix at bnf.fr
Fri Apr 2 19:16:20 CEST 2021


Dear all,

We are pleased to announce that, last month, we published our selective 
crawls seed lists on the new version of the BnF website dedicated to APIs 
and datasets. These lists are created from BCWeb exports including some 
crawl settings and descriptive elements like themes and keywords.
In 2020, three new crawls were launched and added on the website: 
Instagram, Artificial Intelligence and Environnemental Issues.
You can consult all these lists at this address: 
https://api.bnf.fr/fr/liste-des-adresses-url-des-collectes-ciblees-du-web-francais-par-la-bnf

Another page which is focused on Covid-19 selections can be consulted at 
this address: https://api.bnf.fr/fr/node/176

For the second consecutive year, we launched an Instagram crawl. We plan 
to make five Instagram crawls, some of them are about specific subjects 
like the Olympic games or the regional and departmental elections in 
France.
Just like last year, we had to crawl picuki.com. Actually, in spite of 
many tests, we always end up being blocked by Instagram.

And finally, our in-house harvesting workshop about Flash is going to 
finish. It was complicated to find a way to harvest automatically some of 
the websites with Flash animations because some URLs are dynamically 
generated or relative, and so they are inaccessible to Heritrix. So we 
will try to discover all the URLs with the help of a human hand and we 
will launch the harvest in a second time.
In case of successful crawl, we will sometimes have an issue with 
compatibility of Flash plugin used with the Wayback.

Best regards,
The BnF digital legal deposit team

En raison des directives gouvernementales liées à la situation sanitaire, les expositions restent fermées jusqu'à nouvelle consigne. Les manifestations culturelles ne peuvent pas accueillir de public mais sont en grande partie  diffusées en ligne . La bibliothèque tous publics est ouverte du mardi au vendredi de 10 h à 17 h. 
Les bibliothèques de recherche sont ouvertes, sur le site François-Mitterrand, le lundi de 14 h à 17 h et du mardi au vendredi de 10 h à 17 h, et, sur les sites Richelieu, Arsenal et Opéra, de 10 h à 17 h du lundi au vendredi.  Consulter les modalités d'accès Avant d'imprimer, pensez à l'environnement. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20210402/9dcb8a7b/attachment.html>


More information about the Netarchivesuite-curator mailing list