[Netarchivesuite-curator] NAS update from KB DK
sas at kb.dk
Tue Sep 10 09:51:15 CEST 2019
Hereby an update on our summer activities and on our plans
* We ran a mini event harvest on Trumps plan to by Greenland from the Danish Queen, especially Twitter activities and reactions from foreign Medias.
* After our 2nd broad crawl for 2019, which finished in May we reworked the results and changed some configurations. We started our 3rd broad crawl on 1 September, 1st step with a limit of 50 MB. Step 2 will have a limit of 16 GB, simultaneously with step 1 we started a run of "ultra big sites", "OAI-extraction (research databases)", "municipalities and regions", "ministries and administrative bodies" and, YouTube videos
* One of our most important problems observed with the selective crawls is js-lazy-load: images are not displayed or worse, not even captured.
>From our ongoing projects:
* We are looking forward to implement the new features for BCWeb, so we can go on with building up an external community to help us with the collection work using BCWeb
* We are going to rethink our collection strategy within the frame of the general collection strategy for the digital cultural heritage.
* We are investigating the solution with only one online copy of Netarchive
* Hopefully we soon will get allocated more IT resources, so we can go on with the implementation of browser based harvesting in our production system. Umbra still is not totally in place.
* There are still some issues to be solved before we can implement SolR wayback in our frontend - especially legal issues in connection with GDPR
On behalf of the Netarchive Team
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Netarchivesuite-curator