[Netarchivesuite-curator] NAS update from KB DK

Sabine Schostag sas at kb.dk
Tue May 9 12:02:27 CEST 2017

Dear all.

Hereby a brief update on what we are working on:

Ø  We ended up with 22 representative Danish Facebook profiles, which we started harvesting with Archive-IT

Ø  We started a broad crawl with running step 1 (10MB/domain), but we have some problems: e.g. “Running jobs are not up to date”, submitted jobs are pending, H3 remote access doesn’t work, Jobs are done, but marked as failed. Crawls for “Ministries and administrative bodies’ sites” and “Ultra big sites” , selective crawls of sites, which formerly had been part of the broad crawls, are done without trouble..

Ø  We continue with testing BCWeb

Ø  An administration issue has to be solved: We have to journalise our external correspondence (with users, website owners, etc). This issue is in process


From: Netarchivesuite-curator [mailto:netarchivesuite-curator-bounces at ml.sbforge.org] On Behalf Of peter.stirling at bnf.fr
Sent: Friday, April 14, 2017 3:56 PM
To: netarchivesuite-curator at ml.sbforge.org
Subject: [Netarchivesuite-curator] BnF NAS update for April

Hello all,

As there are presidential elections in May and legislative elections in June in France, we have launched a focused crawl of the French electoral web. This crawl began last October (to cover the primary elections) and will continue until July 2017. The aim of this crawl is to document the election campaigns and votes of the presidential and legislative elections by archiving the online publications (candidates and political party websites, news articles, blog posts, social media reactions, institutional websites…) linked to political life and citizen debate. For the presidential elections, we are working closely with 14 BnF librarians, from the departments of Law, Economy, Politics and Philosophy, History, Human Sciences. Twenty-two legal deposit libraries, based in different French regions and overseas territories, have just started to cover the legislative elections. 610 sites have been crawled for the presidential election so far and the crawl is progressing well, thanks to Heritrix 3.

At the end of April we will start working on the broad crawl. As usual this will take place in the autumn but we will be adapting the workflow to the new versions of NAS and Heritrix and to new architecture (new machines and operating system).

Finally, Géraldine, Sara and Thomas will be attending the NAS Workshop in Vienna at the end of the month, they look forward to seeing those of you who will be there.

Best regards,
The BnF digital legal deposit team

Pass BnF lecture/culture illimité à 15 € / an<http://www.bnf.fr/fr/la_bnf/anx_actu_bib/a.pass_bnf.html> - Tout lire, tout voir, tout écouter ! – Acheter en ligne<https://inscriptionbilletterie.bnf.fr/>

Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20170509/85b2b44e/attachment.html>

More information about the Netarchivesuite-curator mailing list