[Netarchivesuite-users] Example configuration for a single-sitesetup

Søren Vejrup Carlsen svc at kb.dk
Wed Mar 4 17:05:02 CET 2009

Hi Nicolas.

I agree that the installation manual too theoretic. Many would go further, and call it unusable.

To address this very weak point of NetarchiveSuite, we have the last months worked on a new Deploy system

that is bundled with the next release of NetarchiveSuite 3.8, hopefully available in a month or so.


On the page http://netarchive.dk/suite/Documentation

The links at the bottom page refer to the latest revision of the code in our repository.


If you are willing to try it now,  we can send you a recent build of our code,

or allow you to download our sourcecode directly from our svn repository.


We don't allow anonymous checkout, so in that case, you would have to sign up for account on 



and then go to https://gforge.statsbiblioteket.dk/project/request.php?group_id=7

to ask us to let you join the project.


Søren Vejrup Carlsen, NetarchiveSuite developer

Department of Digital Preservation, Royal Library, Copenhagen, Denmark 
tlf: (+45) 33 47 48 41
email: svc at kb.dk <mailto:svc at kb.dk> 
Non omnia possumus omnes
--- Macrobius, Saturnalia, VI, 1, 35 -------














Fra: netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk [mailto:netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk] På vegne af nicolas.giraud at bnf.fr
Sendt: 3. marts 2009 17:36
Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
Emne: [Netarchivesuite-users] Example configuration for a single-sitesetup



We are in the process of deploying NetArchive Suite to drive our harvest definitions and crawls at the French National Library. So far I have setup a one machine sandbox-type installation of the suite, based on the simple_harvest environment. I have read multiple times the Intallation Manual, but really I believe there is too much theorical information there, and too few examples. I am very confused as where to start. I would like to see some example configuration files for a single-site setup scenario, that would make things a lot more clear.

Let me explain quickly what we intend to do. Currently we have our ARC files located on data nodes, Petaboxes that were delivered to us by Internet Archive. Now we are moving to be autonomous on our crawls. So basically our setup would have : 

- multiple machines to store ARC files (without redundancy)
- multiple machines to host Heritrix crawlers and perform indexing
- one machine handling the definition of harvests

First I would like to setup a development environment with 3 machines (either 3 physical machines or using virtualization) :

- one harvest definition machine
- one crawler machine
- one storage machine

I would use mySQL for the database specifics.

Is it possible to have sample configuration files, or a kind of tutorial for such a deployment environment? I think I can work out something starting from the simple harvest setup but I'd appreciate some guidance to proceed faster.

Thanks in advance,

Nicolas Giraud

Avant d'imprimer, pensez à l'environnement.
Consider the environment before printing this mail.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20090304/13d05e13/attachment-0002.html>

More information about the NetarchiveSuite-users mailing list