[Netarchivesuite-users] Experiencing difficulties with QuickStartsystem

Søren Vejrup Carlsen svc at kb.dk
Wed Sep 10 12:23:54 CEST 2008


Hi Nicolas.
>Second issue, I can't get access to the Heritrix admin console, I have tried to edit  /netarchivesuite/nas-3.6.0/conf/settings.xml and
 >change the settings.harvester.harvesting.heritrix.guiPort, however whatever  value I tried, http://localhost:<guiPort> would return an error 404.
 >I have noticed however an heritrix process on port 8092... 
The cause of the second problem is, that you are editing the wrong settings.xml
The settings.xml used by the quick
start-system is the file "scripts/simple_harvest/settings.xml"
The value for guiPort in that file is 8090
 
/Søren

-----Original Message-----
From: netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk [mailto:netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk]On Behalf Of nicolas.giraud at bnf.fr
Sent: Tuesday, September 09, 2008 6:30 PM
To: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
Subject: [Netarchivesuite-users] Experiencing difficulties with QuickStartsystem



Hi, 

I am currently trying to deploy NetArchiveSuite 3.6.0  at the French National Library for evaluation purposes. I am toying with the quickstart system, but I have a couple of problems. 

First I can't get the crawl jobs to work because I did not manage to have proxy settings taken into account. My environment is installed on a Debian Etch system, in /netarchivesuite/nas-3.6.0. I have used the the "Edit Harvest Templates" UI to create a new template based on host_10levels_orderxml. The only things I changed are the following lines: 

<string name="http-proxy-host"/> changed to <string name="http-proxy-host">fw_in.bnf.fr</string> 
<string name="http-proxy-port">8080</string> changed to <string name="http-proxy-port">8080</string> 

I then created a new template by uploading the modified file. I have created a configuration using this template for the domains I wish to harvest.However this does not seem to be taken into account by the Heritrix crawler, so the jobs terminate with the "Domain Completed" status. 

Second issue, I can't get access to the Heritrix admin console, I have tried to edit  /netarchivesuite/nas-3.6.0/conf/settings.xml and change the settings.harvester.harvesting.heritrix.guiPort, however whatever  value I tried, http://localhost:<guiPort> would return an error 404. I have noticed however an heritrix process on port 8092... 

And last issue, is there a possibility to shut down the system while not losing all defined domains and harvests? 

Best regards, 

Nicolas Giraud

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20080910/96fcade5/attachment-0002.html>


More information about the NetarchiveSuite-users mailing list