[Netarchivesuite-curator] April update from KB/SB
sas at statsbiblioteket.dk
Tue May 10 09:55:40 CEST 2016
Hereby our monthly update:
* We have moved our production site to NAS 5 H3
* We will start the second broad crawl 2016 as soon as NAS 5 and Heritrix 3 are running “smoothly”
* The event crawl on the refugee crisis is stil ongoing: As it is a supplement to our selective news media and social media crawls, it is a very little event crawl.
* We are preparing for a new event crawl on the European Capital of Culture project “Aarhus 2017”: we are looking at different scenarios for this event crawl
* We are still unable to harvest anything from Facebook.
* We are revising our collection strategy: There will be less broad crawls and more selective crawls. At the moment we are looking at the selective news media crawls. According to our ressources we need a more streamlined approach for an extended number of domains to be crawled
* The social platform arto.com will be closed down at juni 1st. We were offered a private crawl of the entire site (no WARC files, but likely WARC compatible). We decided to say no thanks and to do a last crawl of the entire site on our own.
* We are working on a business model (juridical and financial issues) for giving corpora from Netarchive to research institutions. Our first customer will be the University of Southern Denmark.
Talk to some of you later
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Netarchivesuite-curator