[Netarchivesuite-users] Harvesting aborted: Heritrix timeouts
Nicchiarelli Eleonora
eleonora.nicchiarelli at onb.ac.at
Tue Mar 2 10:52:23 CET 2010
Dear all,
after some delay, I am investigating again the issue that led to some of our jobs to end with a "harvesting aborted". In the metadata arc file, a typical log line for a domain whose harvesting was aborted reads:
2010-02-05T13:17:06.044Z 200 6580 http://www.brueckenbauten.at/pics/002271.jpg LE http://www.brueckenbauten.at/pages/grp_plan.html image/jpeg #106 20100205125704503+1201540 sha1:C2LNBANJBOXHN54RSARCGBN6VQM7KOFN - timeTrunc,content-size:6840
>From this I understand that the download was truncated after a timeout of approximately 20 minutes (all lines show similar values for the millisecond duration of the fetch). I have now two questions:
- where is this timeout normally configured? Heritrix documentation only makes reference to "configured limits" but does not mention the specific location for this value.
- which influence does this timeout have on inactivity and response timeouts, which we have both set to 10800 seconds or three hours?
Thanks in advance,
Eleonora
Eleonora Nicchiarelli Bettelli
Digital Preservation
Austrian National Library
Josefsplatz 1, A-1015 Vienna
Tel.: +43 1 53410-686
Fax : +43 1 53410-610
Email: eleonora.nicchiarelli at onb.ac.at
http://www.onb.ac.at
More information about the NetarchiveSuite-users
mailing list