[Netarchivesuite-users] Running out of disk space

aponb at gmx.at aponb at gmx.at
Tue Sep 7 16:30:18 CEST 2010


I linked now the arc Directory of that job to a nfs share and it seems 
to work. Heritrix is continuing crawling and has already created new arc 
Files. Thanks for that hint!

The only strange thing is that these four arc files which have been in 
an open state after pausing (for example 
3666-21-20100906155821-03287-webcrawler03.onb.ac.at.arc.gz.open) kept 
untouched and were not modified after continuing crawling.
As I was testing it before on my local machine an open file was again 
used after continuing - I also expected this with my production files, 
but they stayed as they were.
Do you also think that these four files are lost now? Should I rename it 
to gz as I am not expecting that these files will be used again, but 
also I don't know if they are in a valid gz format when they are in an 
open state.
I am very interested in your opinions!


More information about the NetarchiveSuite-users mailing list