[Netarchivesuite-curator] BnF NAS update for July
auriane.quoix at bnf.fr
auriane.quoix at bnf.fr
Mon Jul 4 20:30:16 CEST 2022
Dear all,
Last week, we launched our "Auction house" crawl, which concerns French
auction houses websites. About 200 websites had been selected. Last year,
we had been blacklisted by large auction sites. So we set up a specific
harvest system for auction.fr where many websites are hosted. We added
filters on all the other jobs in progress before starting the harvest and
we created a special queue management to group the URLs of all hosts which
belong to a website into one particular queue. This makes it possible to
avoid sending too many requests at the same time as well as to limit the
harvest to 100 000 URLs per website.
The LIFRANUM crawl carried out in partnership with researchers from the
Jean Moulin University Lyon 3 and the Lumière University Lyon 2 is about
to be launched.
The project aims to identify and map the corpus of digital French-speaking
literature (sites, blogs, social networks). About 1100 sites will be
crawled for this harvest with a specific budget of 15 000 URLs. The
harvest should last about 1 or 2 weeks.
Finally, we are continuing the preparations for our 2022 broad crawl.
Best regards,
The BnF digital legal deposit team
Expositions L'aventure Champollion. Dans le secret des hiéroglyphes – Jusqu'au 24 juillet 2022 | François-Mitterrand – Visages de l’exploration au XIX e siècle. Du mythe à l’histoire – Du 10 mai au 21 août 2022 | François-Mitterrand Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20220704/fdbe2302/attachment.html>
More information about the Netarchivesuite-curator
mailing list