[Netarchivesuite-curator] Summer update from Netarchive
Sabine Schostag
sas at statsbiblioteket.dk
Thu Aug 2 15:53:02 CEST 2012
Hi!
Just a brief update from Netarchive before I leave for two weeks off for trecking in the Alps :)
As our broad crawls a speeded up to last less than 2 month, we took advantage of the break between to broad crawls to crawl
- To crawl "very big web sites" (such as the Danish National Broadcast dr.dk and our other main tv-station tv2.dk) in depth.
- To crawl websites of ministries, departments etc. in depth
- To capture url's of YouTube videos on and by political parties
We started our own event crawl on the Olympics in London: entering url's into the system, QA and monitoring.
As to our selective crawls: "business as usual" - that is to say: analyze of "candidates" (new sites proposed for selective crawls), QA of selective crawls, monitoring harvest jobs, revision of harvest profiles
Best,
Sabine
SABINE SCHOSTAG
BIBLIOTEKAR - WEBKURATOR
DIREKTE 8946 2148
[cid:image001.png at 01CD70C6.E59B7B20]STATSBIBLIOTEKET
VICTOR ALBECKS VEJ 1
8000 AARHUS C
CVR/SE 1010 0682 - EAN 5798000791084
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20120802/ccd3974c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 584 bytes
Desc: image001.png
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20120802/ccd3974c/attachment.png>
More information about the Netarchivesuite-curator
mailing list