[Netarchivesuite-curator] Brief update from netarkivet proir to developers meeting today

Sabine Schostag sas at statsbiblioteket.dk
Tue Feb 14 09:59:04 CET 2012


Dear all!

Find our update at: https://sbforge.org/display/NAS/2012-02-14+Statusmeeting

Or read it here:

Lego had a game zone on the web called Lego Universe, which has been closed recently (2012-01-30). In order to document Lego Universe we did som extra crawls of lego.com and videos from YouTube.com, which were embeddet in Lego Universe.
Quite some webpages are displayed differently by different browsers and browser versions. We studied som examples closely and documented the result in our wiki.
Two of the most important sites with display problems are Facebook and Twitter. The crawllog tells us, that most of the given url's are harvested, but the viewerproxy does not at all show them.
As you know, heritrix cannot harvest streamed video or sound. As to sound we succeeded in harvesting mp3-files from rss feeds from a page with radio streams. We used a template with xml extractor.
Upgrading to NAS 3.18 caused us some trouble - NAS was down for about a week.

Best,
Sabine

SABINE SCHOSTAG
LIBRARIAN, WEB CURATOR
DIRECT +45 8946 2148

NETARCHIVE.DK

[cid:image001.png at 01CCEAFE.EFBBD940]STATSBIBLIOTEKET

STATE AND UNIVERSITY LIBRARY
VICTOR ALBECKS VEJ 1
8000 AARHUS C
DENMARK

VAT NO. 1010 0682

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20120214/7bf054b5/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 584 bytes
Desc: image001.png
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20120214/7bf054b5/attachment.png>


More information about the Netarchivesuite-curator mailing list