[Netarchivesuite-devel] Multiple jobs submitted simultaneously under 5.3.1
sara.aubry at bnf.fr
sara.aubry at bnf.fr
Tue Jul 11 11:18:10 CEST 2017
Hi everyone,
Just a quick note to let you know that we have launched a broad crawl
test with 5.3.1 at the end of last week.
And everything went smooth: we generated 872 jobs, ran 20 of them using 10
crawlers, job status are consistent
and there is nothing wrong with the broker.
We have the following configuration:
- CentOS 7.3 (which seems to be similar to Red Hat 4.8)
- Java(TM) SE Runtime Environment (build 1.8.0_40-b25) 64-Bit
- OpenMQ (MessageQueue5.1)
Maybe more important, we are using this configuration on the scheduler.
<scheduler>
<jobtimeouttime>31536000</jobtimeouttime>
<jobgenerationperiode>60</jobgenerationperiode>
<jobGen>
<class>dk.netarkivet.harvester.scheduler.jobgen.FixedDomainConfigurationCountJobGenerator</class>
<objectLimitIsSetByQuotaEnforcer>false</objectLimitIsSetByQuotaEnforcer>
<domainConfigSubsetSize>5000</domainConfigSubsetSize>
<config>
<fixedDomainCountSnapshot>5000</fixedDomainCountSnapshot>
<fixedDomainCountFocused>500</fixedDomainCountFocused>
<excludeDomainsWithZeroBudget>true</excludeDomainsWithZeroBudget>
<postponeUnregisteredChannel>false</postponeUnregisteredChannel>
</config>
</jobGen>
</scheduler>
If I remember well, at KB and ONB, you are using a different job generator
that tries to make homogenous jobs sizes based
on the previous harvest. The one we are using is making jobs taking the
domains in alphabetical order.
Hope this help,
Sara
De : <aponb at gmx.at>
A : <netarchivesuite-devel at ml.sbforge.org>
Date : 29/06/2017 11:13
Objet : Re: [Netarchivesuite-devel] Multiple jobs submitted simultaneously
under 5.3.1
Envoyé par : Netarchivesuite-devel
<netarchivesuite-devel-bounces at ml.sbforge.org>
Hi Sara,
I forgot to mention that the problems were coming up with our daily
crawls. The intention was to deploy 5.3.1, waiting for some daily crawls,
before starting the broad crawl.
Thanks for your settings and for telling how your broad crawl will work!
Hi Andreas,
Are your problems coming up because you just launched a broad crawl?
At BnF, we are still running 5.3.0 with default settings on these
parameters:
settings.harvester.harvesting.sendReadyInterval on 30s
settings.harvester.harvesting.sendReadyDelay on 1000ms
We are currently testing 5.3.1 on very small crawls (working well)
and we will start bigger crawls next week. I'll let you know
how it goes.
Sara
De : <aponb at gmx.at>
A : <netarchivesuite-devel at ml.sbforge.org>
Date : 28/06/2017 11:43
Objet : [Netarchivesuite-devel] Multiple jobs submitted
simultaneously under 5.3.1
Envoyé par : Netarchivesuite-devel
<netarchivesuite-devel-bounces at ml.sbforge.org>
If was running Nas Version on 5.3.1 in production and did get a huge
number of jobs with the same Configurations submitted. This must be the
behavior of https://sbforge.org/jira/browse/NAS-2614which was fixed for
Version 5.3.1 - the strange thing is, that I had not any problems in
Version 5.3.0.
Is anyone experiencing the same issue?
As suggested I set settings.harvester.harvesting.sendReadyInterval to
300 and I am using settings.harvester.harvesting.sendReadyDelay with
value 300
Also the HarvestJobManagerApplication dies with OutOfMemory Exception,
even when started with parameter -Xmx4096m
20:28:11.823 ERROR d.n.c.lifecycle.PeriodicTaskExecutor - Task threw
exception: java.lang.OutOfMemoryError: Java heap space
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError:
Java heap space
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
~[na:1.8.0_77]
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
~[na:1.8.0_77]
at
dk.netarkivet.common.lifecycle.PeriodicTaskExecutor.checkExecution(PeriodicTaskExecutor.java:171)
[common-core-5.3.1.jar:UNKNOWN_REVISION]
at
dk.netarkivet.common.lifecycle.PeriodicTaskExecutor.access$500(PeriodicTaskExecutor.java:47)
[common-core-5.3.1.jar:UNKNOWN_REVISION]
at
dk.netarkivet.common.lifecycle.PeriodicTaskExecutor$1.run(PeriodicTaskExecutor.java:152)
[common-core-5.3.1.jar:UNKNOWN_REVISION]
Do you have any thoughts on this?
Regards
a.
_______________________________________________
Netarchivesuite-devel mailing list
Netarchivesuite-devel at ml.sbforge.org
https://ml.sbforge.org/mailman/listinfo/netarchivesuite-devel
Expositions :
Le monde selon Topor - jusqu'au 16 juillet 2017 - BnF -
François-Mitterrand
La bibliothèque, la nuit – Bibliothèques mythiques en réalité virtuelle -
jusqu'au 13 août 2017 - BnF - François-Mitterrand
Avant d'imprimer, pensez à l'environnement.
_______________________________________________
Netarchivesuite-devel mailing list
Netarchivesuite-devel at ml.sbforge.org
https://ml.sbforge.org/mailman/listinfo/netarchivesuite-devel
_______________________________________________
Netarchivesuite-devel mailing list
Netarchivesuite-devel at ml.sbforge.org
https://ml.sbforge.org/mailman/listinfo/netarchivesuite-devel
Expositions :
Le monde selon Topor - jusqu'au 16 juillet 2017 - BnF - François-Mitterrand
La bibliothèque, la nuit – Bibliothèques mythiques en réalité virtuelle - jusqu'au 13 août 2017 - BnF - François-Mitterrand Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-devel/attachments/20170711/d87837c8/attachment.html>
More information about the Netarchivesuite-devel
mailing list