<div dir='auto'><div>Sorry for answering now, I was going through the logs to find the possible error.<div dir="auto"><br></div><div dir="auto">Apparently it's not caused by a specific site and instead many.</div><div dir="auto"><br></div><div dir="auto">The logs show a lot of heritrix specific status codes such as: </div><div dir="auto">"-1" (DNS lookup failed); </div><div dir="auto">"-50" (temporary status assigned URIs awaiting preconditions; appearance in logs may be a bug);</div><div dir="auto">"-5003" (blocked due to exceeding an established quota)</div><div dir="auto"><br></div><div dir="auto">Also, i let the harvesting run for 5 days and the logs show that it still managed to get URIs with normal codes such as 200, 404 and 302. Just the time between the URI processing varied from few hours up to days with the status TIMED_WAITING. Eventually I was forced to terminate the job manually.</div><div dir="auto"><br></div><div dir="auto">I know that you can specify a value to "crawlLimiter.maxTimeSeconds", but I would also want to know what could have caused this.</div><br><div class="gmail_extra"><br><div class="gmail_quote">On Apr 30, 2018 14:14, Søren Vejrup Carlsen <svc@kb.dk> wrote:<br type="attribution"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>
<div>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061">Hi Koit.</span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061">Is it only a specific website, that causes the problem? Or is this a general problem?</span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061"> </span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061">Anyway, you can always log on to the heritrix3 instance and terminate the job manually</span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061">With the following credentials (User: admin , Password: adminPassword )</span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061">Unless you have changed these values.</span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061"> </span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061"> </span></p>
<p><b><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061">Søren Vejrup Carlsen</span></b><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061"><br>
IT-konsulent<br>
</span><span style="font-size:10pt;font-family:'arial' , sans-serif;color:gray">IT consultant</span><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061"></span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061"> </span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061">IT-Udvikling.København</span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:gray">ITUI</span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061"> </span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061">+4591324841</span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061"><a href="mailto:svc@kb.dk">svc@kb.dk</a></span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061"> </span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061"> </span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061"><img border="0" width="115" height="90" style="width:1.1979in;height:0.9375in" src="cid:image002.png@01D3E085.22979CF0"></span><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061"></span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061"> </span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061">Det Kgl. Bibliotek</span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061">Royal Danish Library</span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061"> </span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061">Søren Kierkegaards Plads 1</span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061">DK-1221 København K</span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061">+45 3347 4747</span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061"> </span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#5a5a5a">CVR 2898 8842</span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#5a5a5a">EAN 5798 000 795297</span><span style="font-size:11pt;font-family:'calibri' , sans-serif;color:#244061"></span></p>
<p><span style="font-size:10pt;font-family:'arial' , sans-serif;color:#244061"> </span></p>
<p><b><span style="font-size:11pt;font-family:'calibri' , sans-serif">From:</span></b><span style="font-size:11pt;font-family:'calibri' , sans-serif"> NetarchiveSuite-users [mailto:netarchivesuite-users-bounces@ml.sbforge.org]
<b>On Behalf Of </b>Koit Summatavet<br>
<b>Sent:</b> Monday, April 30, 2018 11:50 AM<br>
<b>To:</b> netarchivesuite-users@ml.sbforge.org<br>
<b>Subject:</b> [Netarchivesuite-users] Issue - harvest running infinitely</span></p>
<p> </p>
<div>
<p>Hi,</p>
<div>
<p> </p>
</div>
<div>
<p>I have started using NAS to harvest Estonian websites and I have encountered a problem:</p>
</div>
<div>
<p> </p>
</div>
<div>
<p>In a situation where the harvest doesn't hit either the document not the size limit then the harvest runs infinitely and all the threads are in TIMED_WAITING state where they wait from hours to days. The longer it runs the longer the wait
becomes and URL's are processed very slowly and after a long time.</p>
</div>
<div>
<p> </p>
</div>
<div>
<p>How to stop this frong happening and changes to make in the harvest template?</p>
</div>
<div>
<p> </p>
</div>
<div>
<p>I am using NAS version 5.3.1. Does the same happen on versuon 5.4?</p>
</div>
<div>
<p> </p>
</div>
<div>
<p>With regards,</p>
</div>
<div>
<p>Koit</p>
</div>
</div>
</div>
</div>
</blockquote></div><br></div></div></div>