[Netarchivesuite-devel] troubleshooting deduplication

sara.aubry at bnf.fr sara.aubry at bnf.fr
Fri Sep 16 14:02:17 CEST 2011


Hello everyone,

As I mentionned during our last teleconference, we are testing 
NetarchiveSuite 3.16.1 and a new architecture to launch our annual broad 
crawl.

We activated the harvest on August 23 (almost 3 weeks ago!) and the 
deduplication index is still ready!

1) Could you tell us what is the configuration of your index server (CPU, 
RAM, local disk space vs. nfs partition) and how long did your 
deduplication process last for how much data?

2) Is it possible (have you ever tested) to generate a deduplication index 
in a test environment and use it in your production environment?
We hope to be able to end our deduplication process and use the index... 

3) When a job starts, how does the index server know that an index has 
already been created?

Many thanks for your answers.

Sara
 

Fermeture annuelle des sites François-Mitterrand et Richelieu  - du lundi 5 au dimanche 18 septembre 2011 inclus 
Journée du patrimoine  - samedi 17 septembre (Sablé-sur-Sarthe et Maison Jean-Vilar à Avignon) et dimanche 18 septembre (autres sites, dont François-Mitterrand et Richelieu) Avant d'imprimer, pensez à l'environnement. 


More information about the Netarchivesuite-devel mailing list