[Netarchivesuite-users] Example configuration for a single-site setup

nicolas.giraud at bnf.fr nicolas.giraud at bnf.fr
Tue Mar 3 17:36:01 CET 2009


We are in the process of deploying NetArchive Suite to drive our harvest 
definitions and crawls at the French National Library. So far I have setup 
a one machine sandbox-type installation of the suite, based on the 
simple_harvest environment. I have read multiple times the Intallation 
Manual, but really I believe there is too much theorical information 
there, and too few examples. I am very confused as where to start. I would 
like to see some example configuration files for a single-site setup 
scenario, that would make things a lot more clear.

Let me explain quickly what we intend to do. Currently we have our ARC 
files located on data nodes, Petaboxes that were delivered to us by 
Internet Archive. Now we are moving to be autonomous on our crawls. So 
basically our setup would have : 

- multiple machines to store ARC files (without redundancy)
- multiple machines to host Heritrix crawlers and perform indexing
- one machine handling the definition of harvests

First I would like to setup a development environment with 3 machines 
(either 3 physical machines or using virtualization) :

- one harvest definition machine
- one crawler machine
- one storage machine

I would use mySQL for the database specifics.

Is it possible to have sample configuration files, or a kind of tutorial 
for such a deployment environment? I think I can work out something 
starting from the simple harvest setup but I'd appreciate some guidance to 
proceed faster.

Thanks in advance,

Nicolas Giraud

Avant d'imprimer, pensez à l'environnement. 
Consider the environment before printing this mail.   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20090303/25a23a8c/attachment-0002.html>

More information about the NetarchiveSuite-users mailing list