[Netarchivesuite-users] production and maintenance questions

sara.aubry at bnf.fr sara.aubry at bnf.fr
Wed Feb 17 17:02:18 CET 2010


Dear all,

We are still going trough our large test crawl and facing other problems.

1) We noticed that for many jobs (we dont' have the exact figure yet but 
this is still many),
we are missing Heritrix report files, which are important to us because we 
are using them for stats.
Our crawl engineer saw that the HarvestController does not leave much time 
to Heritrix to compile its 
reports after a job finishes and shutdowns Heritrix very rapidly. Is there 
a way to strech this time out?

2) One of our HarvestController went down just after a job finished. Once 
again, Heritrix did not have
the time to create the reports, so we used the make_reports.pl script 
which comes with the
Heritrix package to create them from the crawl.log :
-  Is there a way / a script in NS package to create the metadata ARC 
file? 
- How should we transfer the ARC files? We are using the local ARC 
repository 
implementation. We looked at the upload.sh script but it doesn't look like 
it is going to update the database.
- Can we restart manually our HarvestController?

Thanks for your help!

Sara





Avant d'imprimer, pensez à l'environnement. 
Consider the environment before printing this mail.   



More information about the NetarchiveSuite-users mailing list