[Netarchivesuite-curator] BnF NAS update for July

annick.lefollic at bnf.fr annick.lefollic at bnf.fr
Tue Jul 12 10:36:01 CEST 2016

Hello all,

Each year, the different sections of the  BnF legal deposit department 
give a view of the documents they have received. L’Observatoire du dépôt 
légal : reflet de l’édition contemporaine is now available online (in 
French only):
It gives analysis and raw data from 2015 on seed domains (more than 
900,000 have appeared since the previous year and more than 500,000 have 
disappeared), on format, on http response codes, on the biggest harvested 

This month we also have several project crawls on different themes.

Among these project crawls, the annual one dedicated to French Official 
Publications is still going on with few new aims. Launched in the middle 
of June, it contains a sample of the web social presence of the central 
administration, with the decision to add the social media accounts of 
ministers and public bodies. While this is unfortunately without crawls of 
Facebook pages because of the now well-known problem of captchas, the goal 
is to reflect this type of official communication that was previously not 
so well covered in our selections. The frequence of the crawls of these 
specific ways to promote official publications, administrative and 
political communication could be extended in the future. The traditional 
aim of collecting the "classic" online publications is still relevant, 
with more than 800 URL seeds of traditional websites, crawled with a 
100,000 URL budget for each.
Our annual crawl of auction houses has just finished. The scope of the 
collection is the same as in previous years, but last year the platform 
auction.fr, which represents about a third of the crawl, blocked access by 
our robots. The librarian in charge of the selection contacted the site 
owner who was happy to let us crawl the site, and the quality seems much 
better this year. We also have to be careful as the majority of the sites 
are hosted on two platforms (auction.fr and Drouot), and their catalogues 
and images are stored on a small number of hosts - we have to increase the 
budget for these hosts to collect as much as possible.

We are also maintaining our crawl "Solidarities" with the same scope as 
last year, though we have also included sites that were  selected for an 
emergency crawl on the refugee crisis .

Best regards,

The BnF digital legal deposit team
Expositions : 
La franc-maçonnerie  - jusqu'au 24 juillet 2016 - BnF - François-Mitterrand Avant d'imprimer, pensez à l'environnement. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20160712/eefcb135/attachment-0001.html>

More information about the Netarchivesuite-curator mailing list