[Netarchivesuite-users] how do I configure NS to have a minimum or fixed value of 3500 domains per job?
Søren Vejrup Carlsen
svc at kb.dk
Mon Mar 8 15:55:37 CET 2010
Hi Sara.
>is there a way to set these parameters to have a minimum or fixed value of domains per job?
No, there isn't currently, but it would probably be a good idea.
/Søren
-----Oprindelig meddelelse-----
Fra: netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk [mailto:netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk] På vegne af sara.aubry at bnf.fr
Sendt: 8. marts 2010 15:43
Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
Cc: bert.wendland at bnf.fr; nicolas.giraud at bnf.fr; PAUL.FIEVRE at bnf.fr
Emne: [Netarchivesuite-users] how do I configure NS to have a minimum or fixed value of 3500 domains per job?
Hello everyone,
We are trying to configure the job-generation part of NetarchiveSuite to have jobs with 3500 domains/configurations in it.
After we set the configChunkSize to 3500, NS created jobs with either 1000 or 2500 configurations in the first stage and only 98 configurations in the second stage, which really slows down our crawl speed (after a few hours, we have many jobs with a few active queues that can last several days...).
We re-read the configuration manual, smart guesses are made about target size but is there a way to set these parameters to have a minimum or fixed value of domains per job?
Thanks for your help!
Sara
/////////////////////////////////////
Here is our current scheduler configuration :
<scheduler>
<errorFactorPrevResult>10</errorFactorPrevResult>
<errorFactorBestGuess>20</errorFactorBestGuess>
<expectedAverageBytesPerObject>38000</expectedAverageBytesPerObject>
<maxDomainSize>5000</maxDomainSize>
<jobs>
<maxRelativeSizeDifference>100</maxRelativeSizeDifference>
<minAbsoluteSizeDifference>2000</minAbsoluteSizeDifference>
<maxTotalSize>500000</maxTotalSize>
</jobs>
<configChunkSize>3500</configChunkSize>
<splitByObjectLimit>true</splitByObjectLimit>
</scheduler>
Avant d'imprimer, pensez ? l'environnement.
Consider the environment before printing this mail.
More information about the NetarchiveSuite-users
mailing list