[Netarchivesuite-curator] (Erratum) BnF NAS update for April
auriane.quoix at bnf.fr
auriane.quoix at bnf.fr
Fri Apr 2 19:16:20 CEST 2021
Dear all,
We are pleased to announce that, last month, we published our selective
crawls seed lists on the new version of the BnF website dedicated to APIs
and datasets. These lists are created from BCWeb exports including some
crawl settings and descriptive elements like themes and keywords.
In 2020, three new crawls were launched and added on the website:
Instagram, Artificial Intelligence and Environnemental Issues.
You can consult all these lists at this address:
https://api.bnf.fr/fr/liste-des-adresses-url-des-collectes-ciblees-du-web-francais-par-la-bnf
Another page which is focused on Covid-19 selections can be consulted at
this address: https://api.bnf.fr/fr/node/176
For the second consecutive year, we launched an Instagram crawl. We plan
to make five Instagram crawls, some of them are about specific subjects
like the Olympic games or the regional and departmental elections in
France.
Just like last year, we had to crawl picuki.com. Actually, in spite of
many tests, we always end up being blocked by Instagram.
And finally, our in-house harvesting workshop about Flash is going to
finish. It was complicated to find a way to harvest automatically some of
the websites with Flash animations because some URLs are dynamically
generated or relative, and so they are inaccessible to Heritrix. So we
will try to discover all the URLs with the help of a human hand and we
will launch the harvest in a second time.
In case of successful crawl, we will sometimes have an issue with
compatibility of Flash plugin used with the Wayback.
Best regards,
The BnF digital legal deposit team
En raison des directives gouvernementales liées à la situation sanitaire, les expositions restent fermées jusqu'à nouvelle consigne. Les manifestations culturelles ne peuvent pas accueillir de public mais sont en grande partie diffusées en ligne . La bibliothèque tous publics est ouverte du mardi au vendredi de 10 h à 17 h.
Les bibliothèques de recherche sont ouvertes, sur le site François-Mitterrand, le lundi de 14 h à 17 h et du mardi au vendredi de 10 h à 17 h, et, sur les sites Richelieu, Arsenal et Opéra, de 10 h à 17 h du lundi au vendredi. Consulter les modalités d'accès Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20210402/9dcb8a7b/attachment.html>
More information about the Netarchivesuite-curator
mailing list