[Netarchivesuite-curator] Monthly update from KB DK
sas at kb.dk
Tue Dec 4 10:47:59 CET 2018
Hereby a brief update on Netarchive’s production activities
Step 2 of our third broad crawl (with a data limit of 14 GB per domaine) is still ongoing. It progresses rather slowly. The reason might be the growing centralization of webhosting sites. We also have problems with the job scheduling/running of jobs and monitoring of the broad crawls in “GUI open”
We often run into problems, which we cannot solve without developers assistance e.g.
- IP-validated access to content behind pay walls (the website owner claims to have established the access, but it does not work
- Quite some websites are blocking our crawlers even though they are obliged to give access according to the legal deposit law
We run our mini event crawl mini-event harvest “Week 46”: web sites of local broadcast stations’ (both radio and television)
We had a follow up to the the special crawl for man hunt by Danish police on 28 September, when Danish Secret Service (PET) revealed, that Iranian Secret Service was prevented in an assassination on Danish soil. We crawled foreign news media articles on the revelation.
On behalf of the Netarchive team
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Netarchivesuite-curator