[Netarchivesuite-users] Job already finished, but still started in NAS

aponb at gmx.at aponb at gmx.at
Tue Oct 6 17:46:00 CEST 2009


Hi!

I have following problem! Some days ago we put back a broken server into 
NAS. The server is used as a webcrawler. And after starting the 
application on this server, I was waiting that the application of that 
machine appear in the Systemstate. But they did not. Due to a 
configuration error in the hosts File on this Machine the following 
error appeared in the GUI:

Unable to proxy JMX beans on host 
'webcrawler05.onb.ac.at.onb.ac.at:8400', last seen active at 'Tue Oct 06 
08:12:14 CEST 2009'
dk.netarkivet.common.exceptions.IOFailure: Could not connect to 
service:jmx:rmi://webcrawler05.onb.ac.at.onb.ac.at:8600/jndi/rmi://webcrawler05.onb.ac.at.onb.ac.at:8400/jmxrmi
at 
dk.netarkivet.common.utils.JMXUtils.getMBeanServerConnection(JMXUtils.java:189)
at 
dk.netarkivet.common.utils.JMXUtils.getMBeanServerConnection(JMXUtils.java:164)
at 
dk.netarkivet.monitor.jmx.RmiProxyConnectionFactory.getConnection(RmiProxyConnectionFactory.java:67)

But the server did get a new Job to assigned and did start to crawl. I 
corrected the hosts file to the correct value. I was thinking after 
finishing that job, the application will reconnect in the right way. But 
that did not happen. The job finished crawling and all files were 
uploaded to the storage, but the NAS did get any information about that. 
But the Harvester on that server did already got a new job assigned, 
which is already running. The problem is now that I have one job without 
any statistical information and there will be more jobs without stopping 
that machine (how can I stop a HarvesterControllerInstance right after 
uploading the files from the last job - is there a possibility to stop 
assigning jobs to that instance? - if I can stop that 
HarvesterController, a restart of that Controller should bring the 
applications of that server in the systemstate right?)
Without having the statistical information the NAS will again crawl many 
of these domains of that job, although these are maybe already 
completed. So how can I recalculate the statistical data from the arc 
files, which are already uploaded in the storage?

Thanks for you help!
a.



More information about the NetarchiveSuite-users mailing list