[Netarchivesuite-users] more upload errors

Bjarne Andersen netarkivet at statsbiblioteket.dk
Mon Jul 28 15:39:57 CEST 2008


It might be a problem with the HarvesterControllerApplication and the SideKick.

The HarvesterControllerApplication is implemented in such way that it destroys itself after it has finished a job. The SideKick ensures to 
restart that specific harvester again. So each HarvesterControllerServer has a corrosponding SideKick.

The SideKick must know of a script to restart the harvester - you have to make sure that the way you start the harvester in your init-script 
is the same way the SideKick starts it after each job - preferable by calling the same script starting the HarvesterControllerServer.

So you might want to put the startup of the HarvesterControllerServer in its own little script and call that from your init script (remember 
your global set variables)

In our startup-scripts (automatically generated with the DeployApplication) one specific harvester has the following start-script:
#!/bin/bash
export 
CLASSPATH=/home/prod/PROD/lib/dk.netarkivet.harvester.jar:/home/prod/PROD/lib/dk.netarkivet.archive.jar:/home/prod/PROD/lib/dk.netarkivet.viewerproxy.jar:/home/prod/PROD/lib/dk.netarkivet.monitor.jar:$CLASSPATH;
cd /home/prod/PROD
java -Xmx1536m  -Dsettings.harvester.harvesting.heritrix.guiPort=8095  -Dsettings.harvester.harvesting.heritrix.jmxPort=8195 
-Ddk.netarkivet.settings.file=/home/prod/PROD/conf/settings_harvester_8081.xml 
-Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger 
-Djava.util.logging.config.file=/home/prod/PROD/conf/log_harvestcontrollerapplication.prop -Dsettings.common.jmx.port=8110 
-Dsettings.common.jmx.rmiPort=8210 -Dsettings.common.jmx.passwordFile=/home/prod/PROD/conf/jmxremote.password -Djava.security.manager 
-Djava.security.policy=/home/prod/PROD/conf/security.policy  dk.netarkivet.harvester.harvesting.HarvestControllerApplication < /dev/null > 
start_harvester_8081.sh.log 2>&1 &

And the matching SideKick has the following startup-script:
java -Xmx1536m -Ddk.netarkivet.settings.file=/home/prod/PROD/conf/settings_harvester_8081.xml 
-Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger 
-Djava.util.logging.config.file=/home/prod/PROD/conf/log_sidekick.prop -Dsettings.common.jmx.port=8111 -Dsettings.common.jmx.rmiPort=8211 
-Dsettings.common.jmx.passwordFile=/home/prod/PROD/conf/jmxremote.password -Djava.security.manager 
-Djava.security.policy=/home/prod/PROD/conf/security.policy  dk.netarkivet.harvester.sidekick.SideKick 
dk.netarkivet.harvester.sidekick.HarvestControllerServerMonitorHook ./conf/start_harvester_8081.sh  < /dev/null > start_sidekick_8081.sh.log 
2>&1 &

You can see that the SideKick has an argument wich is exactly the startup-script of the HarvesterControllerServer.

I can see that your init-script references /home/user/workspace/netarchive/harvester.sh - put the startup of the Harvester in that file.

Even given your current setup I would imagine that if things should fail it sould fail not until the second job is run because of the 
potential failed restart of the HarvesterControllerServer.

So try with a clean install - clean JMS-broker and see the imqcmd output before doing any jobs - should only have one consumer on the 
Harvester queue

best
Bjarne Andersen

Martin Bella wrote:
> I do not think I have any java processes from old installations. At the time of
> uploading the ps command showed only one instance of JMS Broker, BitarchiveApplication, GUIApplication, ArcRepositoryApplication, BitarchiveMonitorApplication, HarvestControllerApplication, SideKick, IndexServerApplication and ViewerProxyApplication.
> 
> Btw. here is, how my init script on my testing machine looks like:
> 
> #!/bin/bash
> 
> export NetarchiveDir=/home/user/workspace/netarchive
> export CLASSPATH=$CLASSPATH:$NetarchiveDir/lib/dk.netarkivet.harvester.jar
> export CLASSPATH=$CLASSPATH:$NetarchiveDir/lib/dk.netarkivet.archive.jar
> export CLASSPATH=$CLASSPATH:$NetarchiveDir/lib/dk.netarkivet.viewerproxy.jar
> export CLASSPATH=$CLASSPATH:$NetarchiveDir/lib/dk.netarkivet.monitor.jar
> export JAVA_OPTS=-Xmx2048m
> export LOG_SETTINGS="-Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger -Djava.util.logging.config.file=$NetarchiveDir/conf/log.prop"
> 
> cd $NetarchiveDir
> 
> # Bitarchive machines
> 
> export JMX_SETTINGS="-Dsettings.common.jmx.port=8150 -Dsettings.common.jmx.rmiPort=8250"
> export APP_OPTIONS="-Dsettings.archive.bitarchive.thisLocation=sos
> -Dsettings.archive.bitarchive.thisCredentials=examplecredentials"
> export APP=dk.netarkivet.archive.bitarchive.BitarchiveApplication
> /opt/jdk1.5.0_16/bin/java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS
> $APP_OPTIONS $APP &
> 
> # Admin machine
> 
> export JMX_SETTINGS="-Dsettings.common.jmx.port=8100 -Dsettings.common.jmx.rmiPort=8200"
> export APP=dk.netarkivet.common.webinterface.GUIApplication
> export SETTING="-Dsettings.common.remoteFile.port=5440"
> /opt/jdk1.5.0_16/bin/java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS
> $APP_OPTIONS $APP &
> 
> export JMX_SETTINGS="-Dsettings.common.jmx.port=8120 -Dsettings.common.jmx.rmiPort=8220"
> export APP=dk.netarkivet.archive.arcrepository.ArcRepositoryApplication
> export SETTING="-Dsettings.common.remoteFile.port=5441"
> /opt/jdk1.5.0_16/bin/java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS
> $APP_OPTIONS $APP &
> 
> export JMX_SETTINGS="-Dsettings.common.jmx.port=8110 -Dsettings.common.jmx.rmiPort=8210"
> export APP_OPTIONS="-Dsettings.common.archive.bitarchive.thisLocation=sos
> -Dsettings.common.http.port=8081"
> export SETTING="-Dsettings.common.remoteFile.port=5443"
> export APP=dk.netarkivet.archive.bitarchive.BitarchiveMonitorApplication 
> /opt/jdk1.5.0_16/bin/java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS
> $APP_OPTIONS $APP &
> 
> # Harvester machines
> 
> export JMX_SETTINGS="-Dsettings.common.jmx.port=8130 -Dsettings.common.jmx.rmiPort=8230"
> export APP_OPTIONS="-Dsettings.harvester.harvesting.queuePriority=HIGHPRIORITY
> -Dsettings.common.http.port=8081"
> export SETTING="-Dsettings.common.remoteFile.port=5444"
> export APP=dk.netarkivet.harvester.harvesting.HarvestControllerApplication
> /opt/jdk1.5.0_16/bin/java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS
> $APP_OPTIONS $APP &
> 
> export JMX_SETTINGS="-Dsettings.common.jmx.port=8140 -Dsettings.common.jmx.rmiPort=8240"
> export APP_OPTIONS="-Dsettings.common.http.port=8081"
> export APP=dk.netarkivet.harvester.sidekick.SideKick
> export APP_ARGS1=dk.netarkivet.harvester.sidekick.HarvestControllerServerMonitorHook
> export APP_ARGS2=/home/user/workspace/netarchive/harvester.sh
> export SETTING="-Dsettings.common.remoteFile.port=5445"
> /opt/jdk1.5.0_16/bin/java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS
> $APP_OPTIONS $APP $APP_ARGS1 $APP_ARGS2 &
> 
> # Access servers
> 
> export JMX_SETTINGS="-Dsettings.common.jmx.port=8160 -Dsettings.common.jmx.rmiPort=8260"
> export APP=dk.netarkivet.archive.indexserver.IndexServerApplication
> export SETTING="-Dsettings.common.remoteFile.port=5446"
> /opt/jdk1.5.0_16/bin/java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS $APP
> &
> 
> export JMX_SETTINGS="-Dsettings.common.jmx.port=8170 -Dsettings.common.jmx.rmiPort=8270"
> export APP_OPTIONS="-Dsettings.common.http.port=8081 -Dsettings.viewerproxy.baseDir=viewerproxy_8081 -Dsettings.archive.bitarchive.thisLocation=sos"
> export APP=dk.netarkivet.viewerproxy.ViewerProxyApplication
> export SETTING="-Dsettings.common.remoteFile.port=5447"
> /opt/jdk1.5.0_16/bin/java $JAVA_OPTS $SETTING $LOG_SETTINGS $JMX_SETTINGS
> $APP_OPTIONS $APP &
> 
> Best,
> Martin Bella
> University Library in Bratislava
> 
> 
>>Thanks. This seems to show the problem:
>>
>>the JMS-queue: DEV_COMMON_THIS_HACO_127_0_1_1_8081 has registered 2
>>consumers so another harvester-instance is also running and eating the 
>>"Store OK" Messages sent from your ARCRepository.
>>
>>You should check that you do not have running processes from old
>>installations - e.g. use "ps fax | grep java"
>>
>>The other errors looks like som startup / shutdown problems with
>>NetarchiveSuite and the JMS-broker. Make sure that the JMS-broker is 
>>running (and cleaned up) before starting any applications. And make sure
>>that the JMS-broker is running when you try to stop applications 
>>since they will do a clean disconnect from the JMS broker.
>>
>>best
>>Bjarne Andersen
>>
>>
>>Martin Bella wrote:
>>
>>>Hi Bjarne,
>>>
>>>here is the output of the "mq/bin/imqcmd list dst -u admin -passfile
>>>$PASSFILE" command:
>>>
>>>----------------------------------------------------------------------------------------------------
>>>                 Name                   Type    State   Producers  Consumers   
>>>       Msgs           
>>>                                                                     Total   
>>>Count  UnAck  Avg Size
>>>----------------------------------------------------------------------------------------------------
>>>DEV_COMMON_ANY_HIGHPRIORITY_HACO        Queue  RUNNING  1          0          0
>>>     0      0.0
>>>DEV_COMMON_INDEX_CLIENT_127_0_1_1_8081  Queue  RUNNING  1          1          0
>>>     0      0.0
>>>DEV_COMMON_INDEX_SERVER                 Queue  RUNNING  1          1          0
>>>     0      0.0
>>>DEV_COMMON_MONITOR                      Queue  RUNNING  8          1          0
>>>     0      0.0
>>>DEV_COMMON_THE_ARCREPOS                 Queue  RUNNING  3          1          0
>>>     0      0.0
>>>DEV_COMMON_THE_SCHED                    Queue  RUNNING  1          1          0
>>>     0      0.0
>>>DEV_COMMON_THIS_HACO_127_0_1_1_8076     Queue  RUNNING  0          1          0
>>>     0      0.0
>>>DEV_COMMON_THIS_HACO_127_0_1_1_8081     Queue  RUNNING  1          2          0
>>>     0      0.0
>>>DEV_sos_ALL_BA                          Topic  RUNNING  1          1          0
>>>     0      0.0
>>>DEV_sos_ANY_BA                          Queue  RUNNING  1          1          0
>>>     0      0.0
>>>DEV_sos_THE_BAMON                       Queue  RUNNING  2          1          0
>>>     0      0.0
>>>mq.sys.dmq                              Queue  RUNNING  0          0          0
>>>     0      0.0
>>>
>>>At present my installation of Netarchive Suite uses only one harvester
>>>instance. It did the same thing again - uploaded both arc and metadata.arc
>>>file, but reported failure while uploading the arc file.
>>>
>>>Very fresh installation of Netarchive Suite also produced another error
>>>message:
>>>
>>>Cleaned up dk.netarkivet.common.distribute.HTTPRemoteFileRegistry
>>>Cleaning up dk.netarkivet.common.distribute.monitorregistry.JMSMonitorRegistryClient
>>>Cleaned up dk.netarkivet.common.distribute.monitorregistry.JMSMonitorRegistryClient
>>>Cleaning up dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer
>>>Error while cleaning up dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer
>>>java.lang.NullPointerException
>>>	at dk.netarkivet.common.distribute.JMSConnection.removeListener(JMSConnection.java:640)
>>>	at dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClient.close(JMSArcRepositoryClient.java:124)
>>>	at dk.netarkivet.harvester.harvesting.HarvestController.cleanup(HarvestController.java:114)
>>>	at dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer.cleanup(HarvestControllerServer.java:254)
>>>	at dk.netarkivet.common.utils.CleanupHook.run(CleanupHook.java:70)
>>>Cleaned up dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer
>>>Jul 28, 2008 2:10:49 PM ClientCommunicatorAdmin restart
>>>WARNING: Failed to restart: java.io.IOException: Failed to get a RMI stub:
>>>javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: ubuntu804desktop.localdomain; nested exception is: 
>>>	java.net.ConnectException: Connection refused]
>>>Jul 28, 2008 2:10:49 PM RMIConnector RMIClientCommunicatorAdmin-doStop
>>>WARNING: Failed to call the method close():java.rmi.ConnectException:
>>>Connection refused to host: 127.0.1.1; nested exception is: 
>>>	java.net.ConnectException: Connection refused
>>>Jul 28, 2008 2:10:49 PM ClientCommunicatorAdmin Checker-run
>>>WARNING: Failed to check connection: java.net.ConnectException: Connection
>>>refused
>>>Jul 28, 2008 2:10:49 PM ClientCommunicatorAdmin Checker-run
>>>WARNING: stopping
>>>
>>>Best,
>>>Martin Bella
>>>University Library in Bratislava
>>>
>>>
>>>
>>>>For me it looks like the ARCRepository does the right thing - uploads the
>>>>files AND sends "Store OK" back to the harvester. It seems that 
>>>>the harvester does not get that "Store OK" JMS-message - so it times out and
>>>>tries the same file again (3 times)
>>>>
>>>>While uploading could you check how many applications is connected to the
>>>>JMS-broker on each JMS-queue. This is done with:
>>>>/opt/sun/mq/bin/imqcmd list dst -u admin -passfile $PASSFILE
>>>>
>>>>where the file pointed at by $PASSFILE should contain one line:
>>>>imq.imqcmd.password=WHAT_EVER_YOUR_PASSWORD_IS_SET_TO
>>>>
>>>>the default password should be: admin - meaning that your $PASSFILE will
>>>>have the following line:
>>>>imq.imqcmd.password=admin
>>>>
>>>>The output from imqcmd command should state if the harvester and the
>>>>ARCRepository are connected in the right way to the JMS-broker
>>>>
>>>>best
>>>>-- 
>>>>Bjarne Andersen
>>>>Daily Manager - netarchive.dk
>>>>
>>>>State & University Library
>>>>Universitetsparken
>>>>DK-8000 Aarhus C
>>>>T: +45 89462165 - C: +45 25662353
>>>>CVR/SE 10100682 - EAN 5798000791084
>>>>http://netarchive.dk
>>>>
>>>>Martin Bella wrote:
>>>>
>>>>
>>>>>Hi Eld,
>>>>>
>>>>>sorry for creating another thread. I did not get the answer in email form, but
>>>>>I can see it in the "Netarchive-users archives".
>>>>>
>>>>>Concerning the wrong checksum, you were right, now it works. Thanks.
>>>>>Conserning the second problem, I always start JMS as described in the
>>>>>Installation manual and I use the latest version of JMS, but the problem
>>>>>resists. Is there anything else (logs,...) I can send you to solve
>>>>>this?
>>>>>
>>>>>Best,
>>>>>Martin Bella
>>>>>University Library in Bratislava
>>>>>_______________________________________________
>>>>>NetarchiveSuite-users mailing list
>>>>>NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
>>>>>https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchivesuite-users
>>>>
>>>_______________________________________________
>>>NetarchiveSuite-users mailing list
>>>NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
>>>https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchivesuite-users
>>
>>-- 
>>Bjarne Andersen
>>Driftsleder - netarkivet.dk
>>
>>Statsbiblioteket
>>Universitetsparken
>>8000 Århus C
>>Tlf. 89462165 - Mobil 25662353
>>CVR/SE 10100682 - EAN 5798000791084
>>http://netarkivet.dk
> 
> _______________________________________________
> NetarchiveSuite-users mailing list
> NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
> https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchivesuite-users

-- 
Bjarne Andersen
Driftsleder - netarkivet.dk

Statsbiblioteket
Universitetsparken
8000 Århus C
Tlf. 89462165 - Mobil 25662353
CVR/SE 10100682 - EAN 5798000791084
http://netarkivet.dk
-------------- next part --------------
A non-text attachment was scrubbed...
Name: netarkivet.vcf
Type: text/x-vcard
Size: 312 bytes
Desc: not available
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20080728/77695874/attachment-0002.vcf>


More information about the NetarchiveSuite-users mailing list