[Netarchivesuite-devel] Multiple jobs submitted simultaneously under 5.3.1

sara.aubry at bnf.fr sara.aubry at bnf.fr
Tue Jul 11 11:18:10 CEST 2017


Hi everyone,

Just a  quick note to let you know that we have launched a broad crawl 
test with 5.3.1 at the end of last week.
And everything went smooth: we generated 872 jobs, ran 20 of them using 10 
crawlers, job status are consistent
and there is nothing wrong with the broker.

We have the following configuration:
-  CentOS 7.3 (which seems to be similar to Red Hat 4.8)
- Java(TM) SE Runtime Environment (build 1.8.0_40-b25)  64-Bit
- OpenMQ (MessageQueue5.1)


Maybe more important, we are using this configuration on the scheduler.
            <scheduler>
                <jobtimeouttime>31536000</jobtimeouttime>
                <jobgenerationperiode>60</jobgenerationperiode>
                <jobGen>
 
<class>dk.netarkivet.harvester.scheduler.jobgen.FixedDomainConfigurationCountJobGenerator</class>
 <objectLimitIsSetByQuotaEnforcer>false</objectLimitIsSetByQuotaEnforcer>
                    <domainConfigSubsetSize>5000</domainConfigSubsetSize>
                    <config>
 <fixedDomainCountSnapshot>5000</fixedDomainCountSnapshot>
 <fixedDomainCountFocused>500</fixedDomainCountFocused>
 <excludeDomainsWithZeroBudget>true</excludeDomainsWithZeroBudget>
 <postponeUnregisteredChannel>false</postponeUnregisteredChannel>
                    </config>
                </jobGen>
            </scheduler>

If I remember well, at KB and ONB, you are using a different job generator 
that tries to make homogenous jobs sizes based
on the previous harvest. The one we are using is making jobs taking the 
domains in alphabetical order.

Hope this help,

Sara



De :    <aponb at gmx.at>
A :     <netarchivesuite-devel at ml.sbforge.org>
Date :  29/06/2017 11:13
Objet : Re: [Netarchivesuite-devel] Multiple jobs submitted simultaneously 
under 5.3.1
Envoyé par :    Netarchivesuite-devel 
<netarchivesuite-devel-bounces at ml.sbforge.org>



Hi Sara,

I forgot to mention that the problems were coming up with our daily 
crawls. The intention was to deploy 5.3.1, waiting for some daily crawls, 
before starting the broad crawl.

Thanks for your settings and for telling how your broad crawl will work!

Hi Andreas,

Are your problems coming up because you just launched a broad crawl?

At BnF, we are still running 5.3.0 with default settings on these 
parameters:
settings.harvester.harvesting.sendReadyInterval on 30s 
settings.harvester.harvesting.sendReadyDelay on 1000ms

We are currently testing 5.3.1 on very small crawls (working well)
and we will start bigger crawls next week. I'll let you know
how it goes.

Sara 




De :        <aponb at gmx.at>
A :        <netarchivesuite-devel at ml.sbforge.org>
Date :        28/06/2017 11:43
Objet :        [Netarchivesuite-devel] Multiple jobs submitted 
simultaneously under 5.3.1
Envoyé par :        Netarchivesuite-devel 
<netarchivesuite-devel-bounces at ml.sbforge.org>



If was running Nas Version on 5.3.1 in production and did get a huge 
number of jobs with the same Configurations submitted. This must be the 
behavior of https://sbforge.org/jira/browse/NAS-2614which was fixed for 
Version 5.3.1 - the strange thing is, that I had not any problems in 
Version 5.3.0.

Is anyone experiencing the same issue?
As suggested I set settings.harvester.harvesting.sendReadyInterval to 
300 and I am using settings.harvester.harvesting.sendReadyDelay with 
value 300

Also the HarvestJobManagerApplication dies with OutOfMemory Exception, 
even when started with parameter -Xmx4096m

20:28:11.823 ERROR d.n.c.lifecycle.PeriodicTaskExecutor - Task threw 
exception: java.lang.OutOfMemoryError: Java heap space
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: 
Java heap space
        at java.util.concurrent.FutureTask.report(FutureTask.java:122) 
~[na:1.8.0_77]
        at java.util.concurrent.FutureTask.get(FutureTask.java:192) 
~[na:1.8.0_77]
        at 
dk.netarkivet.common.lifecycle.PeriodicTaskExecutor.checkExecution(PeriodicTaskExecutor.java:171) 

[common-core-5.3.1.jar:UNKNOWN_REVISION]
        at 
dk.netarkivet.common.lifecycle.PeriodicTaskExecutor.access$500(PeriodicTaskExecutor.java:47) 

[common-core-5.3.1.jar:UNKNOWN_REVISION]
        at 
dk.netarkivet.common.lifecycle.PeriodicTaskExecutor$1.run(PeriodicTaskExecutor.java:152) 

[common-core-5.3.1.jar:UNKNOWN_REVISION]

Do you have any thoughts on this?
Regards
a.

_______________________________________________
Netarchivesuite-devel mailing list
Netarchivesuite-devel at ml.sbforge.org
https://ml.sbforge.org/mailman/listinfo/netarchivesuite-devel


Expositions :
Le monde selon Topor - jusqu'au 16 juillet 2017 - BnF - 
François-Mitterrand
La bibliothèque, la nuit – Bibliothèques mythiques en réalité virtuelle - 
jusqu'au 13 août 2017 - BnF - François-Mitterrand
Avant d'imprimer, pensez à l'environnement.


_______________________________________________
Netarchivesuite-devel mailing list
Netarchivesuite-devel at ml.sbforge.org
https://ml.sbforge.org/mailman/listinfo/netarchivesuite-devel

_______________________________________________
Netarchivesuite-devel mailing list
Netarchivesuite-devel at ml.sbforge.org
https://ml.sbforge.org/mailman/listinfo/netarchivesuite-devel


Expositions : 
Le monde selon Topor  - jusqu'au 16 juillet 2017 - BnF - François-Mitterrand 
La bibliothèque, la nuit – Bibliothèques mythiques en réalité virtuelle  - jusqu'au 13 août 2017 - BnF - François-Mitterrand Avant d'imprimer, pensez à l'environnement. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-devel/attachments/20170711/d87837c8/attachment.html>


More information about the Netarchivesuite-devel mailing list