[Netarchivesuite-curator] April update from KB/SB

Sabine Schostag sas at statsbiblioteket.dk
Tue May 10 09:55:40 CEST 2016


Dear all,

Hereby our monthly update:


  *   We have moved our production site to NAS 5 H3
  *   We will start the second  broad crawl 2016 as soon as NAS 5 and Heritrix 3 are running “smoothly”
  *   The event crawl on the refugee crisis is stil ongoing: As it is a supplement to our selective news media and social media crawls, it is a very little event crawl.
  *   We are preparing for a new event crawl on the European Capital of Culture project “Aarhus 2017”: we are looking at different scenarios for this event crawl
  *   We are still unable to harvest anything from Facebook.
  *   We are revising our collection strategy: There will be less broad crawls and more selective crawls. At the moment we are looking at the selective news media crawls. According to our ressources we need a more streamlined approach for an extended number of domains to be crawled
  *   The social platform arto.com will be closed down at juni 1st. We were offered a private crawl of the entire site (no WARC files, but likely WARC compatible). We decided to say no thanks and to do a last crawl of the entire site on our own.
  *   We are working on a business model (juridical and financial issues) for giving corpora from Netarchive to research institutions. Our first customer will be the University of Southern Denmark.
Talk to some of you later

KH Sabine

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20160510/24f42b14/attachment.html>


More information about the Netarchivesuite-curator mailing list