[Netarchivesuite-users] how do I configure NS to have a minimum or fixed value of 3500 domains per job?

sara.aubry at bnf.fr sara.aubry at bnf.fr
Mon Mar 8 16:01:55 CET 2010


Søren,

is there a way (even a tricky one) to set these parameters to be close to 
3500 domains?

Sara








Message de : Søren Vejrup Carlsen <svc at kb.dk> 
                      08/03/2010 15:55

Envoyé par : 
<netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk>

Veuillez répondre à 
<netarchivesuite-users at lists.gforge.statsbiblioteket.dk>



Pour
"netarchivesuite-users at lists.gforge.statsbiblioteket.dk" 
<netarchivesuite-users at lists.gforge.statsbiblioteket.dk>
Copie
"bert.wendland at bnf.fr" <bert.wendland at bnf.fr>, "nicolas.giraud at bnf.fr" 
<nicolas.giraud at bnf.fr>, "PAUL.FIEVRE at bnf.fr" <PAUL.FIEVRE at bnf.fr>
Objet
Re: [Netarchivesuite-users] how do I configure NS to have a minimum or 
fixed value of 3500 domains per job?



Hi Sara.
>is there a way to set these parameters to have a minimum or fixed value 
of domains per job?
No, there isn't currently, but it would probably be a good idea.

/Søren
-----Oprindelig meddelelse-----
Fra: netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk 
[mailto:netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk] På 
vegne af sara.aubry at bnf.fr
Sendt: 8. marts 2010 15:43
Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
Cc: bert.wendland at bnf.fr; nicolas.giraud at bnf.fr; PAUL.FIEVRE at bnf.fr
Emne: [Netarchivesuite-users] how do I configure NS to have a minimum or 
fixed value of 3500 domains per job?

Hello everyone,

We are trying to configure the job-generation part of NetarchiveSuite to 
have jobs with 3500 domains/configurations in it.
After we set the configChunkSize to 3500, NS created jobs with either 1000 
or 2500 configurations in the first stage and only 98 configurations in 
the second stage, which really slows down our crawl speed (after a few 
hours, we have many jobs with a few active queues that can last several 
days...).

We re-read the configuration manual, smart guesses are made about target 
size but is there a way to set these parameters to have a minimum or fixed 
value of domains per job?

Thanks for your help!

Sara

/////////////////////////////////////


Here is our current scheduler configuration :

<scheduler>
        <errorFactorPrevResult>10</errorFactorPrevResult>
        <errorFactorBestGuess>20</errorFactorBestGuess>
 <expectedAverageBytesPerObject>38000</expectedAverageBytesPerObject>
        <maxDomainSize>5000</maxDomainSize>
                <jobs>
 <maxRelativeSizeDifference>100</maxRelativeSizeDifference>
 <minAbsoluteSizeDifference>2000</minAbsoluteSizeDifference>
                        <maxTotalSize>500000</maxTotalSize>
                </jobs>
        <configChunkSize>3500</configChunkSize>
        <splitByObjectLimit>true</splitByObjectLimit>
</scheduler>
 




Avant d'imprimer, pensez ? l'environnement. 
Consider the environment before printing this mail. 

_______________________________________________
NetarchiveSuite-users mailing list
NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchivesuite-users







Avant d'imprimer, pensez à l'environnement. 
Consider the environment before printing this mail.   



More information about the NetarchiveSuite-users mailing list