[Netarchivesuite-users] how do I configure NS to have a minimum or fixed value of 3500 domains per job?

sara.aubry at bnf.fr sara.aubry at bnf.fr
Mon Mar 8 15:43:03 CET 2010


Hello everyone,

We are trying to configure the job-generation part of NetarchiveSuite to 
have jobs with 3500 domains/configurations in it.
After we set the configChunkSize to 3500, NS created jobs with either 1000 
or 2500 configurations in the first stage and only 98 configurations in 
the second stage,
which really slows down our crawl speed (after a few hours, we have many 
jobs with a few active queues that can last several days...).

We re-read the configuration manual, smart guesses are made about target 
size but is there a way 
to set these parameters to have a minimum or fixed value of domains per 
job?

Thanks for your help!

Sara

/////////////////////////////////////


Here is our current scheduler configuration :

<scheduler>
        <errorFactorPrevResult>10</errorFactorPrevResult>
        <errorFactorBestGuess>20</errorFactorBestGuess>
 <expectedAverageBytesPerObject>38000</expectedAverageBytesPerObject>
        <maxDomainSize>5000</maxDomainSize>
                <jobs>
 <maxRelativeSizeDifference>100</maxRelativeSizeDifference>
 <minAbsoluteSizeDifference>2000</minAbsoluteSizeDifference>
                        <maxTotalSize>500000</maxTotalSize>
                </jobs>
        <configChunkSize>3500</configChunkSize>
        <splitByObjectLimit>true</splitByObjectLimit>
</scheduler>
 




Avant d'imprimer, pensez à l'environnement. 
Consider the environment before printing this mail.   



More information about the NetarchiveSuite-users mailing list