[Netarchivesuite-curator] Monthly update from KB DK

Sabine Schostag sas at kb.dk
Tue May 7 12:17:33 CEST 2019

Dear all,

Hereby a brief update from Netarchive:

We started our second broad crawl for 2019, first step with a byte limit of 50 MB per domain. Second step will have a byte limit of 16 GB per domain. We adjusted the byte limits after having analyzed last year's broad crawls.

"Ultra big sites", "OAI-extraction (research databases)", "Ministries and administrative bodies" and YouTube crawls are running simultaneously with the broad crawl.

We started a selective crawl for the approaching parliamentary elections. We hoped that they would take place together with the elections for the European Parliament, but they will not. At the monthly curator meeting tomorrow, we will have to agree on how to deal with the European parliament elections.

We prepared our list of politicians Facebook profiles and fixed the URL's as BNE does for the crawl with our Archive-IT account.

We upgraded UMBRA in the production system

Otherwise business as usual. By the way we still struggle with very slowly working access (wayback) systems.

ON behalf of the Netarchive team
Best, Sabine

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20190507/a2bb67c6/attachment.html>

More information about the Netarchivesuite-curator mailing list