[Netarchivesuite-curator] BnF NAS Update for September

peter.stirling at bnf.fr peter.stirling at bnf.fr
Tue Sep 4 10:31:37 CEST 2012

Hello all,

Here is our update for September.

This summer, BnF launched a new type of harvest. We observed that blog 
platforms did not have a good representation in our broad crawl because of 
the small budget dedicated to each domain. So we prepared a selective 
crawl with 16 well-known French platforms (such as free.fr, skyrock.com, 
typepad.com). We extracted the names of sites located on these domains 
from all the host reports found in NAS (that means reports from 2010 to 
2012). We only kept those which are still active. This gave us a list of 
430 000 seeds, which we harvested during a period of two weeks. We still 
need to do quality assurance.

Best regards,

The BnF digital legal deposit team

Participez à l'acquisition d'un Trésor national : le  Livre d'heures de Jeanne de France Avant d'imprimer, pensez à l'environnement. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20120904/ffee99fd/attachment.html>

More information about the Netarchivesuite-curator mailing list