[Netarchivesuite-devel] Known problems are back
aponb at gmx.at
aponb at gmx.at
Fri Aug 10 12:03:40 CEST 2018
After starting our yearly Domain Crawl we are experiencing a combination
of known problems.
On the one hand hand, the "Multiple duplicate Jobs Created"
(https://sbforge.org/jira/browse/NAS-2682) happens to our daily crawls,
which means that not only duplicate jobs will be created, but also there
will be a deactivation of HarvestDefinitions, which is annoying, because
you need to manual activate these Definitions, otherwise there will be
no further crawl.
This is the log message:
20:14:34.445 WARN d.n.h.scheduler.HarvestJobGenerator - Exception while
scheduling harvestdefinition #105(20180630_EU_Ratspraesidentschaft2018).
The harvestdefinition has been deactivated!
dk.netarkivet.common.exceptions.PermissionDenied: Somebody else must
have updated HD #105: '20180630_EU_Ratspraesidentschaft2018' since
edition 51, not updating
at
dk.netarkivet.harvester.datamodel.HarvestDefinitionDBDAO.update(HarvestDefinitionDBDAO.java:459)
at
dk.netarkivet.harvester.scheduler.HarvestJobGenerator$JobGeneratorTask$JobGeneratorThread.run(HarvestJobGenerator.java:256)
On the other hand, we do get the following error, which is known as
"Heritrix Address already in use"-Bug (e.g.
https://sbforge.org/jira/browse/NAS-1377 or
https://sbforge.org/jira/browse/NAS-2477) and which was already
discussed some times ago. Since starting our domain crawl, this happens
the whole time. The half of Jobs of daily crawls are failing due to
following exception: "dk.netarkivet.common.exceptions.IOFailure: Port
XXXX already in use, or port is out of range". That also happens to jobs
of the domain crawl, so you need constantly to resubmit failed jobs to
get a full crawl. But normally this all should work automatically.
And we are not having duplicate ports on our crawler machines. We are
using the same deploy settings since years.
Does anyone have any ideas for workarounds? That would be great, because
especially the "Heritrix Address already in use"-Bug is really really
disturbing our daily work.
Regards
a.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-devel/attachments/20180810/d737dd77/attachment.html>
More information about the Netarchivesuite-devel
mailing list