[Netarchivesuite-curator] Update from KB DK

Sabine Schostag sas at kb.dk
Sat Nov 4 16:39:23 CET 2017

Fra: Sabine Schostag
Sendt: 3. november 2017 10:49
Til: netarchivesuite-curator-bounces at ml.sbforge.org
Emne: Update from KB DK

Dear all.

Hereby an update from KB DK:

Our third broad crawl ran from September 13 to September 25 – with a budget of 10 MB per domain, that is to say we ran our usual step 1. Due to unsolved problems with H3 we will not be able to run step 2 (budget normally 100 MB)

We are preparing the event crawl of the local and regional elections on November 21. As our selective crawls cover the news media part of the elections, we will exclude them from the event crawl. We talked about using the last broad crawl for 2017 as a “back up” for the event crawl by starting it just after the election day. But as we won’t be able to run an “in depth going” broad crawl before spring 2018, so this will be no option. Anyway, focus will be on Social Media (Twitter, Facebook, YouTube) NGO’s, companies, other stakeholders.

We hope to get hints and help from the to days Social Media workshop. The first day was very fruitfull. It focused on how to identify relevant profiles, content etc on Twitter and Facebook. The second day will be about capturing content (API’s etc.). After the second day (on Monday) I’ll provide any information that would be useful for you.

We implemented BCWeb in our production system, our intention is to use it for the election event crawl. But there are still some open questions, especially the transfer in connection to our new way to build configurations (which do not include hops) is a big issue to be solved.

We started testing BNF’s NAS preload tool for the activation/deactivation of domains and cleaning up of their seeds concerning the broad crawls.

Our Webdanica project (automatic finding Danish content from TLD’s other than .dk by capturing outlinks from domains archived in Netarchive) is almost ready for going into production. If you have any questions on this project, Stephen or Tue will be able to tell more about it on our next meeting on Tuesday.

Have a nice Friday and a nice weekend :)



Sabine Schostag


Web curator



+45 8946 2148

sas at kb.dk

[cid:image001.png at 01D35491.692F8770]

Det Kgl. Bibliotek

Royal Danish Library

Victor Albecks Vej 1

DK-8000 Aarhus C

+45 3347 4747

CVR 2898 8842

EAN 5798 000 792142


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20171104/b0805041/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 6918 bytes
Desc: image001.png
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20171104/b0805041/attachment.png>

More information about the Netarchivesuite-curator mailing list