[Netarchivesuite-curator] Netarchive NAS update for November/December

Sabine Schostag sas at statsbiblioteket.dk
Tue Dec 8 17:02:35 CET 2015


Hi all.

Hereby a brief update from KB/SB curator activities:

The first step of our fourth broad crawl for 2015 started on 13 November. We have set filters to avoid too many false 200 or 404 response codes and login pages. According to our estimation we will collect about 20 mio. URL’s our about 1 TB in this step.

The preparation for the migration to H3/NAS 5 are ongoing: we will reduce the number of templates and filters significantly.

We are running two event crawls:

  1.  The Danish EU justice opt-out referendum: our goal is to document how political parties, predominant politicians, relevant organizations and NGO’s, samples of Danish citizens arguing on the subject. Furthermore we collect comments and articles from foreign medias.


2.      The European refugee crisis from the Danish point of view: this event crawl is a supplement to our selective crawls, as the Medias discussions of the subject are covered by our selective crawls.
We started identifying and registering our so-called special collections: We have special collections older than Netarchive as well as (ongoing) separate collections with content we are unable to capture with Heritrix (e.g. YouTube videos). We created a template/sheet for the description of these collections.
Best,
Sabine (on behalf of the Netarchive curators)

Sabine Schostag
Web curator
THE NETARCHIVE
[cid:image001.png at 01CFE4A6.C9C92360]STATE AND UNIVERSITY LIBRARY
Victor Albecks Vej 1
8000 AARHUS C
DENMARK
VAT NO. 1010 0682
___________________________________________
http://netarkivet.dk/in-english/

From: Netarchivesuite-curator [mailto:netarchivesuite-curator-bounces at ml.sbforge.org] On Behalf Of peter.stirling at bnf.fr
Sent: Friday, November 27, 2015 9:18 AM
To: netarchivesuite-curator at ml.sbforge.org
Subject: [Netarchivesuite-curator] BnF NAS update for November

Hello all,

Following the attacks of November 13th, the digital legal deposit team  decided to launch an event crawl to capture a sample of reactions on the web (tributes, support, analysis, reactions). As well as our usual harvest of free and subscription-based news sites, which have been covering the events, we have also crawled 18 new sites (such as https://www.facebook.com/PoliceNationale/or http://www.defense.gouv.fr) and 43 Twitter accounts or hashtags (such as https://twitter.com/Place_Beauvauor http://twitter.com/hashtag/attackparis), from November 16th to November 20th. Capturing Twitter four times a day and other sites once a day during a week gave 1.5 million URLs for a volume of 34.7 GB.

We have other planned crawls in November and December. This electoral year in France is somewhat special, for two reasons: French citizens are called upon to vote at an unusual period of the year, December, and for the first regional elections which involve a major change in the organisation of the regions, with a reduction from 26 to 17. However, the need to preserve ephemeral born-digital electoral content and the way the BnF’s digital legal deposit handles this crawl stay the same year after year.

Therefore, we are once again working with external partners for the selection of sites. Since the end of September, librarians from the BnF collection departments and from 21 regional libraries in France have started selecting websites to archive. Each library’s curator team makes its selections at the scale of its local electoral campaign, using a common framework of selection and the BnF’s selection tool “BCWeb”. This system has been in place for each election since 2010 and allows great continuity in the nature of the electoral collections in the web archives. The seed lists will be crawled from 1st to 31st of December, on a daily basis.  We will be sure to send you all the detailed results of the crawl in a future update.

As you probably are aware, France will host and chair the 21st Conference of Parties of the United Nations Framework Convention on Climate Change (COP21 / CMP11), from November 30 to December 11. The BnF will organize different actions and conferences to show its implication in this debate. It was therefore decided to contribute with an event crawl with an allocated budget of 0.5 TB. Our team will ask BnF librarians and partners to send seeds and we will crawl them twice (before and after this period).

Finally, our broad crawl for 2015 has finished - it took only six weeks, much less than we were expecting. We will send more details in our December update.

Best regards,
The BnF Digital Legal Deposit team
________________________________

Expositions :
Anselm Kiefer, l’alchimie du livre <http://www.bnf.fr/fr/evenements_et_culture/anx_expositions/f.kiefer_alchimie.html?seance=1223922305530> - jusqu'au 7 février 2016 - BnF - François-Mitterrand
Images du Grand Siècle, l'estampe française au temps de Louis XIV, 1660-1715 <http://www.bnf.fr/fr/evenements_et_culture/anx_expositions/f.images_grand_siecle.html> - jusqu'au 31 janvier 2016 - BnF - François-Mitterrand

Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20151208/fe1458bf/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 584 bytes
Desc: image001.png
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20151208/fe1458bf/attachment-0001.png>


More information about the Netarchivesuite-curator mailing list