[Netarchivesuite-devel] Multiple jobs submitted simultaneously under 5.3.1

Colin Samuel Rosenthal csr at kb.dk
Wed Nov 8 12:55:45 CET 2017


I've created an issue https://sbforge.org/jira/browse/NAS-2682 for this and I have some suspicions about whose code might be responsible for the problem, although right now I can't see anything obviously wrong.


--
Colin Rosenthal PhD
Senior IT Consultant
Royal Danish Library (Aarhus)
________________________________
From: Netarchivesuite-devel <netarchivesuite-devel-bounces at ml.sbforge.org> on behalf of sara.aubry at bnf.fr <sara.aubry at bnf.fr>
Sent: Tuesday, July 11, 2017 11:18:10 AM
To: netarchivesuite-devel at ml.sbforge.org
Subject: Re: [Netarchivesuite-devel] Multiple jobs submitted simultaneously under 5.3.1

Hi everyone,

Just a  quick note to let you know that we have launched a broad crawl test with 5.3.1 at the end of last week.
And everything went smooth: we generated 872 jobs, ran 20 of them using 10 crawlers, job status are consistent
and there is nothing wrong with the broker.

We have the following configuration:
-  CentOS 7.3 (which seems to be similar to Red Hat 4.8)
- Java(TM) SE Runtime Environment (build 1.8.0_40-b25)  64-Bit
- OpenMQ (MessageQueue5.1)


Maybe more important, we are using this configuration on the scheduler.
            <scheduler>
                <jobtimeouttime>31536000</jobtimeouttime>
                <jobgenerationperiode>60</jobgenerationperiode>
                <jobGen>
                    <class>dk.netarkivet.harvester.scheduler.jobgen.FixedDomainConfigurationCountJobGenerator</class>
                    <objectLimitIsSetByQuotaEnforcer>false</objectLimitIsSetByQuotaEnforcer>
                    <domainConfigSubsetSize>5000</domainConfigSubsetSize>
                    <config>
                        <fixedDomainCountSnapshot>5000</fixedDomainCountSnapshot>
                        <fixedDomainCountFocused>500</fixedDomainCountFocused>
                        <excludeDomainsWithZeroBudget>true</excludeDomainsWithZeroBudget>
                        <postponeUnregisteredChannel>false</postponeUnregisteredChannel>
                    </config>
                </jobGen>
            </scheduler>

If I remember well, at KB and ONB, you are using a different job generator that tries to make homogenous jobs sizes based
on the previous harvest. The one we are using is making jobs taking the domains in alphabetical order.

Hope this help,

Sara



De :        <aponb at gmx.at>
A :        <netarchivesuite-devel at ml.sbforge.org>
Date :        29/06/2017 11:13
Objet :        Re: [Netarchivesuite-devel] Multiple jobs submitted simultaneously under 5.3.1
Envoyé par :        Netarchivesuite-devel <netarchivesuite-devel-bounces at ml.sbforge.org>
________________________________



Hi Sara,

I forgot to mention that the problems were coming up with our daily crawls. The intention was to deploy 5.3.1, waiting for some daily crawls, before starting the broad crawl.

Thanks for your settings and for telling how your broad crawl will work!

Hi Andreas,

Are your problems coming up because you just launched a broad crawl?

At BnF, we are still running 5.3.0 with default settings on these parameters:
settings.harvester.harvesting.sendReadyInterval on 30s
settings.harvester.harvesting.sendReadyDelay on 1000ms

We are currently testing 5.3.1 on very small crawls (working well)
and we will start bigger crawls next week. I'll let you know
how it goes.

Sara




De :        <aponb at gmx.at><mailto:aponb at gmx.at>
A :        <netarchivesuite-devel at ml.sbforge.org><mailto:netarchivesuite-devel at ml.sbforge.org>
Date :        28/06/2017 11:43
Objet :        [Netarchivesuite-devel] Multiple jobs submitted simultaneously under 5.3.1
Envoyé par :        Netarchivesuite-devel <netarchivesuite-devel-bounces at ml.sbforge.org><mailto:netarchivesuite-devel-bounces at ml.sbforge.org>
________________________________



If was running Nas Version on 5.3.1 in production and did get a huge
number of jobs with the same Configurations submitted. This must be the
behavior of https://sbforge.org/jira/browse/NAS-2614which was fixed for
Version 5.3.1 - the strange thing is, that I had not any problems in
Version 5.3.0.

Is anyone experiencing the same issue?
As suggested I set settings.harvester.harvesting.sendReadyInterval to
300 and I am using settings.harvester.harvesting.sendReadyDelay with
value 300

Also the HarvestJobManagerApplication dies with OutOfMemory Exception,
even when started with parameter -Xmx4096m

20:28:11.823 ERROR d.n.c.lifecycle.PeriodicTaskExecutor - Task threw
exception: java.lang.OutOfMemoryError: Java heap space
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError:
Java heap space
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
~[na:1.8.0_77]
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
~[na:1.8.0_77]
        at
dk.netarkivet.common.lifecycle.PeriodicTaskExecutor.checkExecution(PeriodicTaskExecutor.java:171)
[common-core-5.3.1.jar:UNKNOWN_REVISION]
        at
dk.netarkivet.common.lifecycle.PeriodicTaskExecutor.access$500(PeriodicTaskExecutor.java:47)
[common-core-5.3.1.jar:UNKNOWN_REVISION]
        at
dk.netarkivet.common.lifecycle.PeriodicTaskExecutor$1.run(PeriodicTaskExecutor.java:152)
[common-core-5.3.1.jar:UNKNOWN_REVISION]

Do you have any thoughts on this?
Regards
a.

_______________________________________________
Netarchivesuite-devel mailing list
Netarchivesuite-devel at ml.sbforge.org<mailto:Netarchivesuite-devel at ml.sbforge.org>
https://ml.sbforge.org/mailman/listinfo/netarchivesuite-devel

________________________________

Expositions :
Le monde selon Topor<http://www.bnf.fr/fr/evenements_et_culture/anx_expositions/f.monde_topor.html>- jusqu'au 16 juillet 2017 - BnF - François-Mitterrand
La bibliothèque, la nuit – Bibliothèques mythiques en réalité virtuelle <http://www.bnf.fr/fr/evenements_et_culture/anx_expositions/f.bibliotheque_la_nuit.html> - jusqu'au 13 août 2017 - BnF - François-Mitterrand

Avant d'imprimer, pensez à l'environnement.


_______________________________________________
Netarchivesuite-devel mailing list
Netarchivesuite-devel at ml.sbforge.org<mailto:Netarchivesuite-devel at ml.sbforge.org>
https://ml.sbforge.org/mailman/listinfo/netarchivesuite-devel

_______________________________________________
Netarchivesuite-devel mailing list
Netarchivesuite-devel at ml.sbforge.org
https://ml.sbforge.org/mailman/listinfo/netarchivesuite-devel

________________________________

Expositions :
Le monde selon Topor<http://www.bnf.fr/fr/evenements_et_culture/anx_expositions/f.monde_topor.html> - jusqu'au 16 juillet 2017 - BnF - François-Mitterrand
La bibliothèque, la nuit – Bibliothèques mythiques en réalité virtuelle <http://www.bnf.fr/fr/evenements_et_culture/anx_expositions/f.bibliotheque_la_nuit.html> - jusqu'au 13 août 2017 - BnF - François-Mitterrand

Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-devel/attachments/20171108/37e34c8a/attachment.html>


More information about the Netarchivesuite-devel mailing list