[Netarchivesuite-curator] Update from KB/SB

Sabine Schostag sas at statsbiblioteket.dk
Mon Sep 19 15:47:06 CEST 2016

Dear all,
Hereby an update from Netarchive:

Broad crawl

·         Last week we launched the third broad crawl 2016. The crawl limit per domaine will be max. 100 MB. There will be special crawls for ministeries and government bodies, and for ultra big sites (e.g. dr.dk)

·         We will try to get in touch with the webpage owneers/web hotels who are blocking our crawler (about 11% are blocking us)

Event crawl

·         The event collection for the Olympics in Rio 2016 will go on until the end of the Paralympics 2016

Selctive crawls

·         We are working on the configuration of the regional/local news media crawls.

·         Facebook

o   We have test-crawled about 60 Danish Facebook profiles with Archive-IT. We are analyzing how much we get from the profiles. We have to renew our account with Archive-IT after the end of November and we are trying to negotiate a good prize.

o   We made a special crawl of Prime Minister Lars Løkkes Facebook profile on 2016.08.30, the day he published his 2025 plan.

Compression of the archive

·         We are preparing for the compression, but this awaits NAS release 5.3

Last not least

Last week we learned, that the ministry of culture wants KB and SB to merge: From January 2017 we will be “Nationalbiblioteket”(NB ☺) with two locations, in Copenhagen and Aarhus


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20160919/24d61ceb/attachment.html>

More information about the Netarchivesuite-curator mailing list