[Netarchivesuite-users] NAS/Heritrix doesn't obey byte limits

Peter Svanberg Peter.Svanberg at kb.se
Mon Mar 18 00:07:24 CET 2019

Hello, NAS users and others!

We are experiencing a very strange behavior from NAS/Heritrix (see attached Excel file, with comments):

The harvests reports says ”Stopped due to … byte/object limit reached" at very different levels – sometimes much above (more than five times the limit), sometimes much below the limit. We fail to see any pattern in this, it seems more or less random.

What are we doing wrong? Is it some error in the harvest template? (Attached below each table.)

Or, if it is some kind of bug, are there workarounds?

We would much appreciate any hints, as this is quite a problem for us, both for the on-going selective harvests and the upcoming big snapshot run!

(We are running version 5.4.2, I hope that it doesn’t affect this problem, as we can’t upgrade now.)

Best regards,

Peter Svanberg
Technical officer
Digital Collections Department, Newspapers, Radio and Television Division

National Library of Sweden
PO Box 5039<x-apple-data-detectors://1/1>
SE-104 51 Stockholm<x-apple-data-detectors://1/1>
Visits: Karlavägen 100, Stockholm <x-apple-data-detectors://2>
Phone<x-apple-data-detectors://2>: +46 10 709 32 78

E-mail: peter.svanberg at kb.se<mailto:peter.svanberg at kb.se>
Web: www.kb.se<http://www.kb.se/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20190317/e2828dc6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: NASLimitProblemsSweden.xlsx
Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size: 72450 bytes
Desc: NASLimitProblemsSweden.xlsx
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20190317/e2828dc6/attachment-0001.xlsx>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20190317/e2828dc6/attachment-0001.htm>

More information about the NetarchiveSuite-users mailing list