[Netarchivesuite-users] Snapshot harvest failure / job generation configuration
nicolas.giraud at bnf.fr
nicolas.giraud at bnf.fr
Mon Apr 27 09:38:31 CEST 2009
Hi,
I've been loading about 2500 domains (for ~3000 seeds) in NAS to perform
an initial test of the snapshot harvest feature. I've launched this test
twice, and in both occurences, the crawl has failed after some time
(roughly 12 hours for both launches). The error that occured is attached
to this message. If I understand correctly this means that the Heritrix
JVM crashed somehow... What can be the cause? I am still trying to find
some clue in the logs bundled in the metadata arc, no luck for now.
Additionnally job generation does not proceed as we expected. I have 3 low
proirity crawlers available in the setup, and only one huge job was
created with the ~3000 seeds in it, hence 2 crawlers were perfectly idle.
My expectation was that NAS would divide these 3000 seeds and generate a
number of smaller jobs, and submit them in turn to all available crawlers.
I used the default job generation parameters, is this behavior correct?
How should I configure job generation to obtian the desired behavior?
All the best,
Nicolas
----- Réacheminé par Nicolas GIRAUD/ETS/BnF le 27/04/2009 08:53 -----
Message de : nicolas.giraud at bnf.fr
25/04/2009 01:41
Pour
nicolas.giraud at bnf.fr
Copie
Objet
Netarkivet error: Fatal error while operating job 'Job 3 (state =
SUBMITTED, HD = 1, priority = LOWPRIORITY, forcemaxcount = -1,
forcemaxbytes = 20000000, orderxml = default_obeyrobots, numconfigs =
2543)'
acheron2.bnf.fr
dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer$HarvesterThread.run(HarvestControllerServer.java:670)
Fatal error while operating job 'Job 3 (state = SUBMITTED, HD = 1,
priority = LOWPRIORITY, forcemaxcount = -1, forcemaxbytes = 20000000,
orderxml = default_obeyrobots, numconfigs = 2543)'
dk.netarkivet.common.exceptions.IOFailure: Error during crawling. The
crawl may have been only partially completed.
at
dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer$HarvesterThread.run(HarvestControllerServer.java:657)
Caused by: dk.netarkivet.common.exceptions.IOFailure: Failed to connect to
URL service:jmx:rmi:///jndi/rmi://localhost:8170/jmxrmi after 0 attempts
at
dk.netarkivet.common.utils.JMXUtils.getJMXConnector(JMXUtils.java:383)
at
dk.netarkivet.harvester.harvesting.JMXHeritrixController.getHeritrixJMXConnector(JMXHeritrixController.java:928)
at
dk.netarkivet.harvester.harvesting.JMXHeritrixController.getCrawlJobAttribute(JMXHeritrixController.java:889)
at
dk.netarkivet.harvester.harvesting.JMXHeritrixController.crawlIsEnded(JMXHeritrixController.java:471)
at
dk.netarkivet.harvester.harvesting.HeritrixLauncher.doCrawlLoop(HeritrixLauncher.java:214)
at
dk.netarkivet.harvester.harvesting.HeritrixLauncher.doCrawl(HeritrixLauncher.java:196)
at
dk.netarkivet.harvester.harvesting.HarvestController.runHarvest(HarvestController.java:221)
at
dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer$HarvesterThread.run(HarvestControllerServer.java:650)
Caused by: java.rmi.NoSuchObjectException: no such object in table
at
sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:255)
at
sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:233)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:142)
at com.sun.jmx.remote.internal.PRef.invoke(Unknown
Source)
at
javax.management.remote.rmi.RMIConnectionImpl_Stub.getConnectionId(Unknown
Source)
at
javax.management.remote.rmi.RMIConnector.getConnectionId(RMIConnector.java:353)
at
javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:315)
at
javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248)
at
dk.netarkivet.common.utils.JMXUtils.getJMXConnector(JMXUtils.java:369)
... 7 more
Avant d'imprimer, pensez à l'environnement.
Consider the environment before printing this mail.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20090427/92bb92f1/attachment-0002.html>
More information about the NetarchiveSuite-users
mailing list