[Netarchivesuite-users] Oldjobs directory growing too big

Bjarne Andersen bja at statsbiblioteket.dk
Sat May 2 12:28:03 CEST 2009


you cannot resubmit your job in the GUI before you have done the workaround for jobs that never leave status STARTED

you have to find an "empty" harvester, copy jobs-dir back to that harvester-instance and restart that specific harvester (kill_harvester_PORTNR.sh / start_harvester_PORTNR.sh - I'm not sure whether theese are avilable if you have not used the deploy-application (completely new version available now))

After the jobs reports status FAILED back in the GUI you can look at the statistics and based on that decide whether your job should run again or you are happy with the amount of data harvested.

best
Bjarne
________________________________________
Fra: netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk [netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk] På vegne af aponb at gmx.at [aponb at gmx.at]
Sendt: 2. maj 2009 11:43
Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
Emne: [Netarchivesuite-users]  Oldjobs directory growing too big

>
> On rare occations (e.g. when a crawler looses the JMS-connection during a crawl) the 3rd step of the above will fail (mostly also the 2nd) because the harvester-application cannot send either upload-messages or the job-finished message. In these cases the jobs will not get reported as finished in the database and will remain in status STARTED. The only way to fix this currently is to copy the entire contents of a job-directory back to a harvester-instance (not running other jobs) and restart that instance. That will make the harvester find the old data and do whats nessecary to do actually all three steps if required.
>
> All this error handling is currently a manual process - but luckily is does not happen that often
>
>

How can I restart that certain instance?
I put the content back into that harvester Dir and then I resubmitted
the failed job out of the User interface. That is probably not the right
way, because it created a new Job which was started a new crawl.
Thanks for your help!
a.



_______________________________________________
NetarchiveSuite-users mailing list
NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchivesuite-users




More information about the NetarchiveSuite-users mailing list