[Netarchivesuite-curator] Update from Netarchive

Sabine Schostag sas at kb.dk
Tue Mar 7 09:38:28 CET 2017

Dear all.

Hereby a brief update on our activities:

NAS 5.2.2
Is now running on our production system. Metadata will be compressed. We do not run deduplication before the whole archive is compressed.

Social Media
The analysis how to crawl selected social media is ongoing. We are looking at Facebook, Twitter, YouTube, Vimeo, Instagram, Soundcloud, Reddit., Flickr, Vine, Pinterest and Linkedin. We already decidet not to collect Snapchat, Google+ or Bandcamp.

We are going to test BCWeb in order to find out, whether we can use it to get help from external curators.

We have upgraded our citrix access software and solved problems with user categories.

Dialog with blocking Web hotels
We started a dialog with web hotels, who are blocking our harvester in order to find a solution that will make them stop the blockade.

Some statistics
Data amount (per 5.3.2017)
Total GB og TB i 1024 tal i arkivet: 793544 774
Number of GB/TB Broad crawls and ultra-big sites: 634638/619 and 62346/60
Number of GB/TB Selective crawls: 97198/94
Number of GB/TB Event crawls: 35387/34
(Exclusive metadata files and Test crawl files)


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20170307/61921b6e/attachment.html>

More information about the Netarchivesuite-curator mailing list