[Netarchivesuite-users] production and maintenance questions
sara.aubry at bnf.fr
sara.aubry at bnf.fr
Wed Feb 17 17:02:18 CET 2010
Dear all,
We are still going trough our large test crawl and facing other problems.
1) We noticed that for many jobs (we dont' have the exact figure yet but
this is still many),
we are missing Heritrix report files, which are important to us because we
are using them for stats.
Our crawl engineer saw that the HarvestController does not leave much time
to Heritrix to compile its
reports after a job finishes and shutdowns Heritrix very rapidly. Is there
a way to strech this time out?
2) One of our HarvestController went down just after a job finished. Once
again, Heritrix did not have
the time to create the reports, so we used the make_reports.pl script
which comes with the
Heritrix package to create them from the crawl.log :
- Is there a way / a script in NS package to create the metadata ARC
file?
- How should we transfer the ARC files? We are using the local ARC
repository
implementation. We looked at the upload.sh script but it doesn't look like
it is going to update the database.
- Can we restart manually our HarvestController?
Thanks for your help!
Sara
Avant d'imprimer, pensez à l'environnement.
Consider the environment before printing this mail.
More information about the NetarchiveSuite-users
mailing list