[Netarchivesuite-users] Oldjobs directory growing too big

Bjarne Andersen bja at statsbiblioteket.dk
Wed Apr 29 10:32:17 CEST 2009


The very simple script we use to empty oldjobs dir for failed uploads is this:
(put in the right paths)
-------------
#!/bin/bash
export CLASSPATH=~/PROD/lib/dk.netarkivet.archive.jar
echo ~/PROD/oldjobs/*/arcs/*.arc | sed 's/ /\n/g' | xargs java dk.netarkivet.archive.tools.Upload
find ~/PROD/oldjobs/*/metadata/*.arc | xargs java dk.netarkivet.archive.tools.Upload
-------------
The java-commandline should be documented in the manuals - it "just" takes a filename as input and uploads the file

The recovering of jobs not reported as FINISHED is still an all manual process here.

best
Bjarne

________________________________________
Fra: netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk [netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk] På vegne af nicolas.giraud at bnf.fr [nicolas.giraud at bnf.fr]
Sendt: 29. april 2009 10:16
Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
Emne: Re: [Netarchivesuite-users] Oldjobs directory growing too big

Hi Bjarne,

Thank you for your explanation, this is very useful information. I experience a lot of jobs losing JMX/RMI connection with Heritrix, hence my oldjobs directory grows quite often very big and saturates the disk on my test machine, which unfortunately does not have much storage.

About the command-line tool you are referring to, where is it in the distribution? Are there other tools? Where are they documented? I've been writing a tool of my own to create/update domains and their default seedlists from a huge list of URLs.

Best regards,
Nicolas


Avant d'imprimer, pensez à l'environnement.
Consider the environment before printing this mail.




More information about the NetarchiveSuite-users mailing list