<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    After starting our yearly Domain Crawl we are experiencing a

    combination of known problems.<br>

    On the one hand hand, the  "Multiple duplicate Jobs Created"

    (<a class="moz-txt-link-freetext" href="https://sbforge.org/jira/browse/NAS-2682">https://sbforge.org/jira/browse/NAS-2682</a>) happens to our daily

    crawls, which means that not only duplicate jobs will be created,

    but also there will be a deactivation of HarvestDefinitions, which

    is annoying, because you need to manual activate these Definitions,

    otherwise there will be no further crawl.<br>

    This is the log message:<br>

    20:14:34.445 WARN  d.n.h.scheduler.HarvestJobGenerator - Exception

    while scheduling harvestdefinition

    #105(20180630_EU_Ratspraesidentschaft2018). The harvestdefinition

    has been deactivated!<br>

    dk.netarkivet.common.exceptions.PermissionDenied: Somebody else must

    have updated HD #105: '20180630_EU_Ratspraesidentschaft2018' since

    edition 51, not updating<br>

            at

dk.netarkivet.harvester.datamodel.HarvestDefinitionDBDAO.update(HarvestDefinitionDBDAO.java:459)<br>

            at

dk.netarkivet.harvester.scheduler.HarvestJobGenerator$JobGeneratorTask$JobGeneratorThread.run(HarvestJobGenerator.java:256)<br>

    <br>

    <br>

    On the other hand, we do get the following error, which is known as

    "Heritrix Address already in use"-Bug (e.g.

    <a class="moz-txt-link-freetext" href="https://sbforge.org/jira/browse/NAS-1377">https://sbforge.org/jira/browse/NAS-1377</a> or

    <a class="moz-txt-link-freetext" href="https://sbforge.org/jira/browse/NAS-2477">https://sbforge.org/jira/browse/NAS-2477</a>) and which was already

    discussed some times ago. Since starting our domain crawl, this

    happens the whole time. The half of Jobs of daily crawls are failing

    due to  following exception:

    "dk.netarkivet.common.exceptions.IOFailure: Port XXXX already in

    use, or port is out of range". That also happens to jobs of the

    domain crawl, so you need constantly to resubmit failed jobs to get

    a full crawl. But normally this all should work automatically. <br>

    <br>

    And we are not having duplicate ports on our crawler machines. We

    are using the same deploy settings since years.<br>

    <br>

    Does anyone have any ideas for workarounds? That would be great,

    because especially the "Heritrix Address already in use"-Bug is

    really really disturbing our daily work.<br>

    <br>

    Regards<br>

    a.<br>

    <br>

    <h1 id="summary-val"><br>

    </h1>

  </body>

</html>