[Netarchivesuite-curator] Summer update from Netarchive

Sabine Schostag sas at statsbiblioteket.dk
Thu Aug 2 15:53:02 CEST 2012


Hi!

Just a brief update from Netarchive before I leave for two weeks off for trecking in the Alps :)

As our broad crawls a speeded up to last less than 2 month, we took advantage of the break between to broad crawls to crawl


-      To crawl "very big web sites" (such as the Danish National Broadcast dr.dk and our other main tv-station tv2.dk) in depth.

-      To crawl websites of ministries, departments etc. in depth

-      To capture url's of YouTube videos on and by political parties

We started our own event crawl on the Olympics in London: entering url's into the system, QA and monitoring.

As to our selective crawls: "business as usual" - that is to say: analyze of "candidates" (new sites proposed for selective crawls), QA of selective crawls, monitoring harvest jobs, revision of harvest profiles

Best,
Sabine

SABINE SCHOSTAG
BIBLIOTEKAR - WEBKURATOR
DIREKTE 8946 2148

[cid:image001.png at 01CD70C6.E59B7B20]STATSBIBLIOTEKET

VICTOR ALBECKS VEJ 1
8000 AARHUS C

CVR/SE 1010 0682 - EAN 5798000791084


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20120802/ccd3974c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 584 bytes
Desc: image001.png
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20120802/ccd3974c/attachment.png>


More information about the Netarchivesuite-curator mailing list