[Netarchivesuite-devel] troubleshooting deduplication

Mikis Seth Sørensen mss at statsbiblioteket.dk
Mon Oct 3 11:41:03 CEST 2011


Hi Sara

Sounds good that the indexing is now running smoothly.

Regarding the partial the job generation this sounds like odd behaviour.
Have you tried to restart the HarvestJobManager, perhaps with a higher log
level for the HarvestJobGenerator?

Regards
Mikis 

On 03/10/11 09.21, "sara.aubry at bnf.fr" <sara.aubry at bnf.fr> wrote:

>Hi all, and many thanks for your answers.
>We have upgraded our IndexServer following your configurations and built
>a 
>Virtual Machine with :
>- 4  Intel Xeon 2,4 GHz CPUs
>- 32 GB RAM
>- 3 Fiber Channel Raw Devices of 1 TB merged to a 3 TB ext3 partition
>And we finally managed to build our index (upon a 23 TB archive) in 4
>days.
>So that's better than 26 :-)
>
>Another question: has job generation been modified in the 3.16 release?
>We activated a snapshot harvest on Friday, only 297 jobs were created and
>have the "New" Status
>(it stoped at the c letter). And we found no errors in the
>HarvestJobManagerApplication log.
>
>Best,
>
>Sara
> 
>
>
>
>
>
>
>
>
>Message de : <aponb at gmx.at>
>                      28/09/2011 14:44
>
>Envoyé par : 
><netarchivesuite-devel-bounces at ml.sbforge.org>
>
>Veuillez répondre à <netarchivesuite-devel at ml.sbforge.org>
>
>
>
>Pour
><netarchivesuite-devel at ml.sbforge.org>
>Copie
>
>Objet
>Re: [Netarchivesuite-devel] troubleshooting deduplication
>
>
>
>
>> 1) Could you tell us what is the configuration of your index server
>(CPU,
>> RAM, local disk space vs. nfs partition) and how long did your
>> deduplication process last for how much data?
>Our Index Server is running on a 2x Intel(R) Core(TM)2 Duo CPU,  E4500
>@ 2.20GHz, 4 GB RAM machine, 500GB disk space.
>We just finished the 1st stage of our current domain crawl and we are
>going to start the 2nd stage soon. I will have a look on the duration of
>the deduplication process and let you know afterwards.
>> 2) Is it possible (have you ever tested) to generate a deduplication
>index
>> in a test environment and use it in your production environment?
>No, we never tried that.
>
>Regards
>a.
>_______________________________________________
>Netarchivesuite-devel mailing list
>Netarchivesuite-devel at ml.sbforge.org
>http://ml.sbforge.org/mailman/listinfo/netarchivesuite-devel
>
>
>
>Exposition  Vogue : l'aventure d'une maison de disque  - jusqu'au13
>novembre 2011 - BnF - François-Mitterrand / Allée Julien Cain Avant
>d'imprimer, pensez à l'environnement.
>_______________________________________________
>Netarchivesuite-devel mailing list
>Netarchivesuite-devel at ml.sbforge.org
>http://ml.sbforge.org/mailman/listinfo/netarchivesuite-devel




More information about the Netarchivesuite-devel mailing list