[Netarchivesuite-users] Heritrix does not stop at limit.

Svein Yngvar Willassen svein at willassen.no
Fri Apr 11 11:37:23 CEST 2008


Hi all,

I configured a selective crawl with a limit of 1 500 000 000 bytes (1.5
Gb).  This limit shows up in Heritrix' admin console as group-max-all-kb set
to 1464844, which appears to be correct. (*1024 ~= 1.5 Gb)

But the crawler has now run for about 24 hours, and in Heritrix admin
console, the amount of crawled content is reported to be 1.6 Gb. The total
number of bytes in the arc files is about 1 700 000 000.

Why doesn't it stop at 1.5 Gb? Is there a difference of which content is
counted by the QuotaEnforcer and the size of the arc files?

-- 
Best Regards,

Svein Y. Willassen
http://willassen.blogspot.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20080411/2850af96/attachment-0002.html>


More information about the NetarchiveSuite-users mailing list