[Netarchivesuite-users] Running out of disk space

Bjarne Andersen bja at statsbiblioteket.dk
Tue Sep 7 16:37:55 CEST 2010

Try copying them to another location and gunzip them. If that goes well I think you could just rename to .gz. Since heritrix started new arc files it seems it has forgotten those .open ones. NetarchiveSuite would also rename for you - at least with non-compressed arc files so I'm not sure it will with .gz files.

Sent fra min HTC Touch Pro

----- Oprindelig meddelelse -----
Fra: aponb at gmx.at <aponb at gmx.at>
Sendt: 7. september 2010 16:29
Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk <netarchivesuite-users at lists.gforge.statsbiblioteket.dk>
Emne: [Netarchivesuite-users]  Running out of disk space


I linked now the arc Directory of that job to a nfs share and it seems
to work. Heritrix is continuing crawling and has already created new arc
Files. Thanks for that hint!

The only strange thing is that these four arc files which have been in
an open state after pausing (for example
3666-21-20100906155821-03287-webcrawler03.onb.ac.at.arc.gz.open) kept
untouched and were not modified after continuing crawling.
As I was testing it before on my local machine an open file was again
used after continuing - I also expected this with my production files,
but they stayed as they were.
Do you also think that these four files are lost now? Should I rename it
to gz as I am not expecting that these files will be used again, but
also I don't know if they are in a valid gz format when they are in an
open state.
I am very interested in your opinions!

NetarchiveSuite-users mailing list
NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk

More information about the NetarchiveSuite-users mailing list