[Netarchivesuite-curator] Brief update from KB/SB

Sabine Schostag sas at statsbiblioteket.dk
Wed Jan 11 15:43:22 CET 2012


Dear colleagues.

Hereby a brief update from netarkivet.

Broad crawls

We finished our third broad crawl for 2011 at 2011-12-19, it lasted 51 days, we harvested 27,9 TB/ 647.588.162 objects.
Our second broad crawl lasted 59 days, we harvested 27,5 TB/630.022.496 objects.

We decided to wait with the first broad crawl 2012 until February and in the meantime we are doing a special harvest on the biggest Danish sites and on the Danish ministries and administrative offices.

In our broad crawls 2011 we harvested a total of 79,3 TB og 1,8 billion documents/url's. In comparison to that the results for our selective crawls are peanuts ;o)

Selective crawls

We are working on improvement of our documentation: a draft of an overall collection policy for our selective harvests is available for presentation to our editorial board.

Event Harvests

>From the Netarchive point of view the end of 2011 wasn't rather eventful.

Anyway - we did an effort on intermediality:  Every year in week 46 - which is called a "usual media week" ( Nov. 14-20 in 2011) SB collects programs from Danish local and amateur tv and radio stations. This year we decided to make an event harvest at Netarchive of those tv-stations, who are streaming their programs. Of cause, we could not capture the streaming, but as we harvested their sites daily, their program descriptions can supply our tv/radio collection.

We just started an event harvest, which will last for half a year: a harvest on the Danish EU presidency 2012<http://um.dk/en/politics-and-diplomacy/denmark-in-the-eu/the-danish-eu-presidency-2012/>


We moved to NetarchiveSuite Version: 3.18.0 on our test platform and the accept test is in process.

A wayback machine as you know it from archive.org is ready for our users.

Finally we are looking forward to solutions for twitter harvests:  we hope a developers interest in it will pusha solution: https://sbforge.org/display/NAS/A+Developer%27s+Perspective+on+Twitter

Best,

Sabine


SABINE SCHOSTAG
LIBRARIAN, WEB CURATOR
DIRECT +45 8946 2148

NETARCHIVE.DK

[cid:image001.png at 01CCD077.67BA9AF0]STATSBIBLIOTEKET

STATE AND UNIVERSITY LIBRARY
VICTOR ALBECKS VEJ 1
8000 AARHUS C
DENMARK

VAT NO. 1010 0682



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20120111/6697a78e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 584 bytes
Desc: image001.png
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20120111/6697a78e/attachment.png>


More information about the Netarchivesuite-curator mailing list