[Netarchivesuite-users] Oldjobs directory growing too big

Søren Vejrup Carlsen svc at kb.dk
Tue Apr 28 11:38:45 CEST 2009

Hi Nicolas.

The oldjobs directory is the final resting place for the crawl-directories after the crawl has finished/aborted, 

and the files have been uploaded to the archive.


In these directories remain indefinitely (or until deleted manually) all the files that are not (yet) stored into the metadata-1.arc


This directory is not cleared automatically!

But it is safe to clear it almost at any time (except when the crawldirectory are moved to this directory after the post-processing of the job has finished).


About clearing the broker's queue: The contents of the oldjobs has nothing to do with the contents of the brokers queue.

In the broker queue, you (may) have outstanding crawljob waiting for an available harvester. 


I hope this answers your question.



Fra: netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk [mailto:netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk] På vegne af nicolas.giraud at bnf.fr
Sendt: 28. april 2009 10:55
Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
Emne: [Netarchivesuite-users] Oldjobs directory growing too big



The oldjobs directory on my crawler machine tends to get very big. What exactly is stored in this directory, is it Heritrix instances for failed jobs? When is that folder cleared? Is it safe to clear it and then clear the broker's queue?


Avant d'imprimer, pensez à l'environnement.
Consider the environment before printing this mail.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20090428/3aac7c58/attachment-0002.html>

More information about the NetarchiveSuite-users mailing list