[Netarchivesuite-users] Harvesting aborted: Heritrix timeouts

Nicchiarelli Eleonora eleonora.nicchiarelli at onb.ac.at
Tue Mar 2 10:52:23 CET 2010


Dear all, 

after some delay, I am investigating again the issue that led to some of our jobs to end with a "harvesting aborted". In the metadata arc file, a typical log line for a domain whose harvesting was aborted reads: 

2010-02-05T13:17:06.044Z   200       6580 http://www.brueckenbauten.at/pics/002271.jpg LE http://www.brueckenbauten.at/pages/grp_plan.html image/jpeg #106 20100205125704503+1201540 sha1:C2LNBANJBOXHN54RSARCGBN6VQM7KOFN - timeTrunc,content-size:6840

>From this I understand that the download was truncated after a timeout of approximately 20 minutes (all lines show similar values for the millisecond duration of the fetch). I have now two questions:

- where is this timeout normally configured? Heritrix documentation only makes reference to "configured limits" but does not mention the specific location for this value.   

- which influence does this timeout have on inactivity and response timeouts, which we have both set to 10800 seconds or three hours? 

Thanks in advance, 

Eleonora

Eleonora Nicchiarelli Bettelli
Digital Preservation
Austrian National Library
Josefsplatz 1, A-1015 Vienna

Tel.:  +43 1 53410-686
Fax :  +43 1 53410-610
Email: eleonora.nicchiarelli at onb.ac.at
http://www.onb.ac.at



More information about the NetarchiveSuite-users mailing list