[Netarchivesuite-curator] BnF NAS update for February
peter.stirling at bnf.fr
peter.stirling at bnf.fr
Mon Feb 16 11:16:38 CET 2015
Dear all,
After the dreadful attacks which occurred on the 7th and 9th of January in
Paris and the events that followed, we decided to launch an emergency
crawl in order to harvest web resources (news articles, blog posts, social
media reactions, institutional websites?) related or reacting to them. We
made an appeal to IIPC members and to our BnF network of librarians,
asking them if they could help us in quickly gathering references to make
the most complete and relevant seedlist possible. Due to the exceptional
nature of the event, the scope and criteria of the selection were extended
to an international scale and aimed to cover the different forms and
diversity of the reactions. We received 2,480 URLs from eighteen different
IIPC members and 1,740 URLs nominated by more than 70 BnF librarians. In
addition to these selections, the already identified seed lists of French
governmental, news, political, and activist websites have been specially
harvested. And finally, our regular daily and weekly harvests of the
principal French news sites, particularly relevant during those days,
worked as usual.
Technically, the crawls were performed from 8th to 16th January 2015 and
each website has been crawled at least once with a depth of page +1
click. During the same period, selected Twitter accounts and popular
hashtags (as the now famous #JesuisCharlie) have been crawled four times a
day. A total of 15.9 million URLs have been collected, for a total of 0.5
TB of data.
Best regards,
The BnF digital legal deposit team
Exposition Oulipo, la littérature en jeu(x) - jusqu'au 15 février 2015 - BnF - Bibliothèque de l'Arsenal Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20150216/76e2adb1/attachment.html>
More information about the Netarchivesuite-curator
mailing list