[Netarchivesuite-curator] Update ONB
Mayr Michaela
michaela.mayr at onb.ac.at
Wed Apr 25 14:36:22 CEST 2012
Hi all,
* Our Domain crawl is almost finished. Just a few rescheduled jobs are expected to be finished.
* We are using now a small hadoop-Cluster, which is located on our crawler machines. We are using 8 worker nodes which can use 8TB of HDFS Storage. We are now using http://pig.apache.org for sorting our cdx-Index and generating statistical reports.
Best
Michaela
Michaela Mayr
Web at rchive Austria
Department Digital Preservation
Austrian National Library
Josefsplatz 1
A-1015 Vienna
fon: (+43 1) 53 410-476
fax: (+43 1) 53 410-610
michaela.mayr at onb.ac.at <mailto:michaela.mayr at onb.ac.at>
http://www.onb.ac.at/ev/about/webarchive.htm <http://www.onb.ac.at/ev/about/webarchive.htm>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20120425/c0ae2838/attachment.html>
More information about the Netarchivesuite-curator
mailing list