[Netarchivesuite-curator] NAS uupdate from Netarchive
Sabine Schostag
sas at statsbiblioteket.dk
Mon Jan 13 21:49:08 CET 2014
Happy New Year to all of you :)
Hereby an update from Netarchive
· We finished our fourth broad crawl for 2013 on 2013-12-27
· We started an event harvest of MGP (Melodie Grand Prix Eurovision de la Chançon), which will take place in Denmark in 2014
· We are harvesting very big Danish sites, such as the Danish Broadcasting stations DR and TV2’s sites. Our broad crawls do not capture these sites completely because of the domaine limits, so we crawl them separately about for times a year.
· We are harvesting ministries and government administration sites for the same reasons as very big sites
· We are looking forward to try out the extended fields for our documentation :)
Best, Sabine
SABINE SCHOSTAG
LIBRARIAN, WEB CURATOR
DIRECT +45 8946 2148
THE NETARCHIVE
[cid:image001.png at 01CF10A9.3DFD4680]STATSBIBLIOTEKET
STATE AND UNIVERSITY LIBRARY
VICTOR ALBECKS VEJ 1
8000 AARHUS C
DENMARK
VAT NO. 1010 0682
From: netarchivesuite-curator-bounces at ml.sbforge.org [mailto:netarchivesuite-curator-bounces at ml.sbforge.org] On Behalf Of peter.stirling at bnf.fr
Sent: Thursday, December 12, 2013 11:34 AM
To: netarchivesuite-curator at ml.sbforge.org
Subject: [Netarchivesuite-curator] BnF NAS update for December
Hello all,
We've finished the first stage of our broad crawl, and started the second stage last week. According to our forecasts we will just have enough space in the storage allocated for our 2013 crawls to complete this second stage, though we have had to limit the budget to 2,300 URL per domain instead of 10,000 URL.
Our storage budget will stay the same for next year (100 Tb), while we are hoping to expand our broad crawl and have several project crawls planned. As part of our planning the programme of crawls for 2014 we are therefore looking at ways to maximise the space we have available, for example by studying files that we collect from external domains, Heritrix errors, files that we collect in multiple crawls or multiple versions, and by working with curators to make sure the settings in selective crawls are optimised.
Best regards,
The BnF digital legal deposit team
________________________________
Exposition Astérix à la BnF !<http://www.bnf.fr/fr/evenements_et_culture/anx_expositions/f.asterix.html> - du 16 octobre 2013 au 19 janvier 2014 - BnF - François-Mitterrand / Grande Galerie
Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20140113/b9c96505/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 584 bytes
Desc: image001.png
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20140113/b9c96505/attachment-0001.png>
More information about the Netarchivesuite-curator
mailing list