[Netarchivesuite-users] more upload errors

Bjarne Andersen netarkivet at statsbiblioteket.dk
Wed Jul 30 08:58:25 CEST 2008

The problem was the viewerproxy application.

Each application on one server needs a unique port-number in the setting: -Dsettings.common.http.port=
    - This is because of several reasons:
      a) The portnumber is used in the JMS-queue naming
      b) The portnumber is used to LISTEN for server-applications (GUIApplication and ViewerproxyApplication)

Your problem was that both your harvester and your viewerproxy had the same -Dsettings.common.http.port=8081.

A bug (in my opinion) is that 3 different applications use "COMMON_THIS_HACO_" in the naming of their JMS-queue:
  - Harvester (seems OK - HACO meens HarvesterController)
  - Viewerproxy (NOT OK - could be something like VIEW)
  - Indexserver (NOT OK - could be something like INDX)

Martin Bella wrote:
> I thought it was not necessary to post how harvester.sh script looks like. The
> first reason is, it was created by copy&paste method and it contains the same
> lines for starting the HarvesterControllerApplication as the netarchive.sh
> script. Probably I could do it the same way as you. I will play with this
> later. The second reason is, when I make a change, I almost always start with a
> clean install. And it fails.
> If you are interested, I think in our organisation we could create a special
> limited account for you on one of our testing machines so that you could have a
> closer look at this problem.
> Best,
> Martin Bella
> University Library in Bratislava
>>It might be a problem with the HarvesterControllerApplication and the
>>The HarvesterControllerApplication is implemented in such way that it
>>destroys itself after it has finished a job. The SideKick ensures to 
>>restart that specific harvester again. So each HarvesterControllerServer has
>>a corrosponding SideKick.
>>The SideKick must know of a script to restart the harvester - you have to
>>make sure that the way you start the harvester in your init-script 
>>is the same way the SideKick starts it after each job - preferable by
>>calling the same script starting the HarvesterControllerServer.
>>So you might want to put the startup of the HarvesterControllerServer in
>>its own little script and call that from your init script (remember 
>>your global set variables)
>>In our startup-scripts (automatically generated with the DeployApplication)
>>one specific harvester has the following start-script:
>>cd /home/prod/PROD
>>java -Xmx1536m  -Dsettings.harvester.harvesting.heritrix.guiPort=8095 
>>-Djava.util.logging.config.file=/home/prod/PROD/conf/log_harvestcontrollerapplication.prop -Dsettings.common.jmx.port=8110 
>>-Dsettings.common.jmx.rmiPort=8210 -Dsettings.common.jmx.passwordFile=/home/prod/PROD/conf/jmxremote.password -Djava.security.manager 
>>dk.netarkivet.harvester.harvesting.HarvestControllerApplication < /dev/null
>>start_harvester_8081.sh.log 2>&1 &
>>And the matching SideKick has the following startup-script:
>>java -Xmx1536m -Ddk.netarkivet.settings.file=/home/prod/PROD/conf/settings_harvester_8081.xml 
>>-Dsettings.common.jmx.port=8111 -Dsettings.common.jmx.rmiPort=8211 
>>./conf/start_harvester_8081.sh  < /dev/null > start_sidekick_8081.sh.log 
>>2>&1 &
>>You can see that the SideKick has an argument wich is exactly the
>>startup-script of the HarvesterControllerServer.
>>I can see that your init-script references /home/user/workspace/netarchive/harvester.sh - put the startup of the Harvester in that file.
>>Even given your current setup I would imagine that if things should fail it
>>sould fail not until the second job is run because of the 
>>potential failed restart of the HarvesterControllerServer.
>>So try with a clean install - clean JMS-broker and see the imqcmd output
>>before doing any jobs - should only have one consumer on the 
>>Harvester queue
>>Martin Bella wrote:
>>>I do not think I have any java processes from old installations. At the time of
>>>uploading the ps command showed only one instance of JMS Broker, BitarchiveApplication, GUIApplication, ArcRepositoryApplication, BitarchiveMonitorApplication, HarvestControllerApplication, SideKick, IndexServerApplication and ViewerProxyApplication.
>>>Btw. here is, how my init script on my testing machine looks like:
>>>export NetarchiveDir=/home/user/workspace/netarchive
>>>export CLASSPATH=$CLASSPATH:$NetarchiveDir/lib/dk.netarkivet.harvester.jar
>>>export CLASSPATH=$CLASSPATH:$NetarchiveDir/lib/dk.netarkivet.archive.jar
>>>export CLASSPATH=$CLASSPATH:$NetarchiveDir/lib/dk.netarkivet.viewerproxy.jar
>>>export CLASSPATH=$CLASSPATH:$NetarchiveDir/lib/dk.netarkivet.monitor.jar
>>>export JAVA_OPTS=-Xmx2048m
>>>export LOG_SETTINGS="-Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger -Djava.util.logging.config.file=$NetarchiveDir/conf/log.prop"
>>>cd $NetarchiveDir
>>># Bitarchive machines
>>>export JMX_SETTINGS="-Dsettings.common.jmx.port=8150 -Dsettings.common.jmx.rmiPort=8250"
>>>export APP_OPTIONS="-Dsettings.archive.bitarchive.thisLocation=sos
>>>export APP=dk.netarkivet.archive.bitarchive.BitarchiveApplication
>>>/opt/jdk1.5.0_16/bin/java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS
>>># Admin machine
>>>export JMX_SETTINGS="-Dsettings.common.jmx.port=8100 -Dsettings.common.jmx.rmiPort=8200"
>>>export APP=dk.netarkivet.common.webinterface.GUIApplication
>>>export SETTING="-Dsettings.common.remoteFile.port=5440"
>>>/opt/jdk1.5.0_16/bin/java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS
>>>export JMX_SETTINGS="-Dsettings.common.jmx.port=8120 -Dsettings.common.jmx.rmiPort=8220"
>>>export APP=dk.netarkivet.archive.arcrepository.ArcRepositoryApplication
>>>export SETTING="-Dsettings.common.remoteFile.port=5441"
>>>/opt/jdk1.5.0_16/bin/java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS
>>>export JMX_SETTINGS="-Dsettings.common.jmx.port=8110 -Dsettings.common.jmx.rmiPort=8210"
>>>export APP_OPTIONS="-Dsettings.common.archive.bitarchive.thisLocation=sos
>>>export SETTING="-Dsettings.common.remoteFile.port=5443"
>>>export APP=dk.netarkivet.archive.bitarchive.BitarchiveMonitorApplication 
>>>/opt/jdk1.5.0_16/bin/java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS
>>># Harvester machines
>>>export JMX_SETTINGS="-Dsettings.common.jmx.port=8130 -Dsettings.common.jmx.rmiPort=8230"
>>>export APP_OPTIONS="-Dsettings.harvester.harvesting.queuePriority=HIGHPRIORITY
>>>export SETTING="-Dsettings.common.remoteFile.port=5444"
>>>export APP=dk.netarkivet.harvester.harvesting.HarvestControllerApplication
>>>/opt/jdk1.5.0_16/bin/java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS
>>>export JMX_SETTINGS="-Dsettings.common.jmx.port=8140 -Dsettings.common.jmx.rmiPort=8240"
>>>export APP_OPTIONS="-Dsettings.common.http.port=8081"
>>>export APP=dk.netarkivet.harvester.sidekick.SideKick
>>>export APP_ARGS1=dk.netarkivet.harvester.sidekick.HarvestControllerServerMonitorHook
>>>export APP_ARGS2=/home/user/workspace/netarchive/harvester.sh
>>>export SETTING="-Dsettings.common.remoteFile.port=5445"
>>>/opt/jdk1.5.0_16/bin/java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS
>>># Access servers
>>>export JMX_SETTINGS="-Dsettings.common.jmx.port=8160 -Dsettings.common.jmx.rmiPort=8260"
>>>export APP=dk.netarkivet.archive.indexserver.IndexServerApplication
>>>export SETTING="-Dsettings.common.remoteFile.port=5446"
>>>/opt/jdk1.5.0_16/bin/java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS $APP
>>>export JMX_SETTINGS="-Dsettings.common.jmx.port=8170 -Dsettings.common.jmx.rmiPort=8270"
>>>export APP_OPTIONS="-Dsettings.common.http.port=8081 -Dsettings.viewerproxy.baseDir=viewerproxy_8081 -Dsettings.archive.bitarchive.thisLocation=sos"
>>>export APP=dk.netarkivet.viewerproxy.ViewerProxyApplication
>>>export SETTING="-Dsettings.common.remoteFile.port=5447"
>>>/opt/jdk1.5.0_16/bin/java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS
>>>>Thanks. This seems to show the problem:
>>>>the JMS-queue: DEV_COMMON_THIS_HACO_127_0_1_1_8081 has registered 2
>>>>consumers so another harvester-instance is also running and eating the 
>>>>"Store OK" Messages sent from your ARCRepository.
>>>>You should check that you do not have running processes from old
>>>>installations - e.g. use "ps fax | grep java"
>>>>The other errors looks like som startup / shutdown problems with
>>>>NetarchiveSuite and the JMS-broker. Make sure that the JMS-broker is 
>>>>running (and cleaned up) before starting any applications. And make sure
>>>>that the JMS-broker is running when you try to stop applications 
>>>>since they will do a clean disconnect from the JMS broker.
>>>>Martin Bella wrote:
>>>>>Hi Bjarne,
>>>>>here is the output of the "mq/bin/imqcmd list dst -u admin -passfile
>>>>>$PASSFILE" command:
>>>>>                Name                   Type    State   Producers  Consumers   
>>>>>      Msgs           
>>>>>                                                                    Total   
>>>>>Count  UnAck  Avg Size
>>>>>DEV_COMMON_ANY_HIGHPRIORITY_HACO        Queue  RUNNING  1          0          0
>>>>>    0      0.0
>>>>>DEV_COMMON_INDEX_CLIENT_127_0_1_1_8081  Queue  RUNNING  1          1          0
>>>>>    0      0.0
>>>>>DEV_COMMON_INDEX_SERVER                 Queue  RUNNING  1          1          0
>>>>>    0      0.0
>>>>>DEV_COMMON_MONITOR                      Queue  RUNNING  8          1          0
>>>>>    0      0.0
>>>>>DEV_COMMON_THE_ARCREPOS                 Queue  RUNNING  3          1          0
>>>>>    0      0.0
>>>>>DEV_COMMON_THE_SCHED                    Queue  RUNNING  1          1          0
>>>>>    0      0.0
>>>>>DEV_COMMON_THIS_HACO_127_0_1_1_8076     Queue  RUNNING  0          1          0
>>>>>    0      0.0
>>>>>DEV_COMMON_THIS_HACO_127_0_1_1_8081     Queue  RUNNING  1          2          0
>>>>>    0      0.0
>>>>>DEV_sos_ALL_BA                          Topic  RUNNING  1          1          0
>>>>>    0      0.0
>>>>>DEV_sos_ANY_BA                          Queue  RUNNING  1          1          0
>>>>>    0      0.0
>>>>>DEV_sos_THE_BAMON                       Queue  RUNNING  2          1          0
>>>>>    0      0.0
>>>>>mq.sys.dmq                              Queue  RUNNING  0          0          0
>>>>>    0      0.0
>>>>>At present my installation of Netarchive Suite uses only one harvester
>>>>>instance. It did the same thing again - uploaded both arc and metadata.arc
>>>>>file, but reported failure while uploading the arc file.
>>>>>Very fresh installation of Netarchive Suite also produced another error
>>>>>Cleaned up dk.netarkivet.common.distribute.HTTPRemoteFileRegistry
>>>>>Cleaning up dk.netarkivet.common.distribute.monitorregistry.JMSMonitorRegistryClient
>>>>>Cleaned up dk.netarkivet.common.distribute.monitorregistry.JMSMonitorRegistryClient
>>>>>Cleaning up dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer
>>>>>Error while cleaning up dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer
>>>>>	at dk.netarkivet.common.distribute.JMSConnection.removeListener(JMSConnection.java:640)
>>>>>	at dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClient.close(JMSArcRepositoryClient.java:124)
>>>>>	at dk.netarkivet.harvester.harvesting.HarvestController.cleanup(HarvestController.java:114)
>>>>>	at dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer.cleanup(HarvestControllerServer.java:254)
>>>>>	at dk.netarkivet.common.utils.CleanupHook.run(CleanupHook.java:70)
>>>>>Cleaned up dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer
>>>>>Jul 28, 2008 2:10:49 PM ClientCommunicatorAdmin restart
>>>>>WARNING: Failed to restart: java.io.IOException: Failed to get a RMI stub:
>>>>>javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: ubuntu804desktop.localdomain; nested exception is: 
>>>>>	java.net.ConnectException: Connection refused]
>>>>>Jul 28, 2008 2:10:49 PM RMIConnector RMIClientCommunicatorAdmin-doStop
>>>>>WARNING: Failed to call the method close():java.rmi.ConnectException:
>>>>>Connection refused to host:; nested exception is: 
>>>>>	java.net.ConnectException: Connection refused
>>>>>Jul 28, 2008 2:10:49 PM ClientCommunicatorAdmin Checker-run
>>>>>WARNING: Failed to check connection: java.net.ConnectException: Connection
>>>>>Jul 28, 2008 2:10:49 PM ClientCommunicatorAdmin Checker-run
>>>>>WARNING: stopping
>>>>>>For me it looks like the ARCRepository does the right thing - uploads the
>>>>>>files AND sends "Store OK" back to the harvester. It seems that 
>>>>>>the harvester does not get that "Store OK" JMS-message - so it times out and
>>>>>>tries the same file again (3 times)
>>>>>>While uploading could you check how many applications is connected to the
>>>>>>JMS-broker on each JMS-queue. This is done with:
>>>>>>/opt/sun/mq/bin/imqcmd list dst -u admin -passfile $PASSFILE
>>>>>>where the file pointed at by $PASSFILE should contain one line:
>>>>>>the default password should be: admin - meaning that your $PASSFILE will
>>>>>>have the following line:
>>>>>>The output from imqcmd command should state if the harvester and the
>>>>>>ARCRepository are connected in the right way to the JMS-broker
>>>>>>Martin Bella wrote:
>>>>>>>Hi Eld,
>>>>>>>sorry for creating another thread. I did not get the answer in email form, but
>>>>>>>I can see it in the "Netarchive-users archives".
>>>>>>>Concerning the wrong checksum, you were right, now it works. Thanks.
>>>>>>>Conserning the second problem, I always start JMS as described in the
>>>>>>>Installation manual and I use the latest version of JMS, but the problem
>>>>>>>resists. Is there anything else (logs,...) I can send you to solve
>>>>>>>Martin Bella
>>>>>>>University Library in Bratislava
