[Netarchivesuite-devel] ONB update + outlook

Mayr Michaela michaela.mayr at onb.ac.at
Tue Feb 2 15:22:28 CET 2010


Dear Netarchive and BnF teams, 

  

Following our call I wanted to give you a brief summary of our recent, ongoing and planned activities at ONB: 

  

Domain crawl: 
We completed stage 1 of our first domain crawl. 895.445 domains have been crawled with max. 10MB per domain. In total 1,57TB of data have been collected. 71% of the domains where smaller than 1MB, 14% larger than 10MB. We received comments from only 3 site owners. Domain crawl stage 2 with max. 100MB per domain has been started recently. We have about 7TB of storage for the domain crawl. If this will not be enough to archive the complete domain, we will keep harvesting without storing the data permanently. This way we will at least gather some important information about the .at domain. We will not be able to run another domain crawl in 2010. Due to budget constraints our storage was cut to 2TB for this year. Therefore, we will focus on selective and event harvesting. 
  

Infrastructure: 
Due to problems with Derby we migrated to MySQL database. We upgraded to NAS 3.10 mid January. Running without problems. :-) 
  

Activities 2010: 
We want to give public access to the webarchive at ONB starting spring 2010. Preparations (user interface, legal restrictions, information materials for users etc.) for this are our main focus at the moment. Next step will be to open the archive to other libraries (including legal restrictions). Winter Olympics: We are participating in the IIPC project and nominated seeds with the nomination tool. We will also crawl a small amount of seeds. We will run an event harvest about Austrian presidential elections in April. Start with permanent selective harvesting (develop selection policy, select sites, harvesting, quality assurance). Preparations for iPres and meetings in conjunction (IWAW, NetarchiveSuite, Heritrix Advanced Users) in September in Vienna . 
  

We are looking forward to our joint activities in 2010. Any questions or comments let me know. 

Best Regards 

Michaela 

  

Michaela Mayr
Webarchive / Department Digital Preservation 

Austrian National Library
Josefsplatz 1
A-1015 Vienna 
fon: (+43 1) 53 410-476
fax: (+43 1) 53 410-610
michaela.mayr at onb.ac.at 
http://www.onb.ac.at/ev/about/webarchive.htm 



  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-devel/attachments/20100202/15cbf30b/attachment-0002.html>


More information about the Netarchivesuite-devel mailing list