[Netarchivesuite-curator] BnF NAS update for September

peter.stirling at bnf.fr peter.stirling at bnf.fr
Thu Sep 8 09:56:23 CEST 2016

Hello all,

We are continuing to work on this year's broad crawl. We are preparing 
nas-preload, the tool used to combine the different sources into a single 
list to be loaded into NAS. This step also includes a DNS check to avoid 
slowing down the crawl with domains that do not have a DNS response. This 
year, in addition to excluding domains with no DNS we are also excluding 
those that give an "unknown" response, as from previous years we know 
there is generally no content on these domains. Overall the seed list will 
contain around 4.4 million active domains, and will have improved 
coverages of the different regional TLDs : .alsace, .paris; .bzh (for 
Brittany) and the French West Indies. 

Turning to project crawls, the 2016 Olympiad is now over but our Olympics 
crawls are still running. The project, in line with the precedent 
collaborative collections documenting the 2014 Sotchi Winter Games and 
2012 London Summer Games, involves seven curators from the Literature and 
Art department who work on the selection based on eight themes. Two crawls 
were planned, before and after the games, covering a list of 558 seeds. 
Concerning social media, we focused only on Twitter, with 447 French 
accounts or hashtags collected twice a day from the 4th to the 24th of 
August. These crawls will be complemented by one for the Paralympic games, 
to be launched on the 18th of September. We have also communicated our 
list of seeds for the worldwide collaborative collection led by the 
British Library for IIPC.

Best regards,
the BnF digital legal deposit team

Fermeture annuelle de la BnF du 3 au 11 septembre 2016 Avant d'imprimer, pensez à l'environnement. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20160908/dd9d452b/attachment.html>

More information about the Netarchivesuite-curator mailing list