[Netarchivesuite-users] CRAWL ENDING - Finished - Ended by operator

Kaare Fiedler Christiansen kfc at statsbiblioteket.dk
Fri Jun 6 09:17:56 CEST 2008


On Fri, 2008-06-06 at 08:57 +0200, Kaare Fiedler Christiansen wrote:
> On Fri, 2008-06-06 at 08:34 +0200, aponb at gmx.at wrote:
> > A configuration, which will be started every four hours, brings 
> > sometimes in the crawler log the message
> > "CRAWL ENDING - Finished - Ended by operator"
> > instead of only
> > "CRAWL ENDING - Finished"
> > 
> > and in fact in these jobs, there are some pages missing, which should 
> > have been crawled.
> > 
> > Do you know what's the reason for that behavior?
> 
> "Ended by operator" happens when the crawl is requested stopped by the
> system.
> 
> This is done when a harvester has been inactive for a long period,
> although there are still URLs in the queue. The amount of time before
> the harvesters are stopped is defined by the two settings:
> 
> settings.harvester.harvesting.heritrix.inactivityTimeout
> settings.harvester.harvesting.heritrix.noresponseTimeout

I should make it absolutely clear that this is of course the amount of
time *with no activity* before the harvest is stopped. We don't stop a
harvester that is still active.

Best,
  Kåre




More information about the NetarchiveSuite-users mailing list