[Netarchivesuite-curator] KB DK update for January/February

Sabine Schostag sas at kb.dk
Tue Feb 5 12:18:16 CET 2019


Dear all.

Hereby an update from Netarchive

Broad crawls
We started our first broad crawl for 2019 on January 26 – step 1, with a limit of 10 MB per domain. We have withdrawn a number of sites from the normal broad crawl, they are crawled simultaneously in three definitions, “ultra big sites”, “OAI extraction” (research databases) and “ministries and government agencies”
A big issue for our broad crawls are webhosting companies. In order not to be blocked by the webhostings we make agreements with them and set up throttling in order not to overload there servers

Selective crawls
We focus on getting content behind paywalls by negotiating for IP validation. Paywalls are an issue for almost all national news media and we will miss essential content, if we do not get content behind paywalls

Event crawls
We will have parliamentary elections this year before the end of June/beginning of July. We are preparing our strategy – both for the parliamentary elections and for the Elections for the European Parliament, which will take place on 26 May in Denmark.

Access forms and procedures
We try to set up a more userfriendly procedure for getting access to our archived content

Netarchive and GDPR
We are giving all our procedures a check for to be sure that we are following the new European Data Protection regulation. We have made changes to google analytics on netarkivet.dk, now we only collect user data allowed by GDPR

Last not least: we have a new colleague working with web archiving – Kristian Bak. He will be with us at the meeting today

On behave of the Netarchive Team

Best,
Sabine

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20190205/487c6ce2/attachment-0001.html>


More information about the Netarchivesuite-curator mailing list