[Netarchivesuite-users] Nothing happens after starting generating dedupcrawllogindex
Jonas Lindberg Frellesen
jolf at kb.dk
Tue May 26 13:07:33 CEST 2009
Hi Andreas
I have found two inconsistencies in your configuration file:
The 'settings.notification' branch in your settings at deployGlobal should be placed under 'settings.common.notification'.
The 'settings.harvester.datamodel.defaultMaxbytes' in the settings for machine 'wc06' should be 'settings.harvester.datamodel.domain.defaultMaxbytes'.
It is very unlikely that the above inconsistencies are causing the problem.
More likely there is something wrong with how Heritrix is started, and there could be something in the Heritrix logs, which could indicate the problem is.
Best regards
Jonas and Søren.
-----Oprindelig meddelelse-----
Fra: netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk [mailto:netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk] På vegne af aponb at gmx.at
Sendt: 26. maj 2009 11:48
Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
Emne: [Netarchivesuite-users] Nothing happens after starting generating dedupcrawllogindex
Hi Jonas!
I changed my configuration now to the settings I attached. I changed to FTPRemoteFile class and did provide for each HarvesterApplication a different remotefile port number.
Now the system is not able to start the crawler:
FINE: Successfully received an index of type 'DEDUP_CRAWL_LOG' for the jobs [82] May 19, 2009 4:57:08 PM dk.netarkivet.common.distribute.FTPRemoteFile logOn
FINE: Logged onto ftp://netarchive:**********@wc05:21
May 19, 2009 4:57:08 PM dk.netarkivet.common.distribute.FTPRemoteFile
cleanup
FINE: Deleting file 'segments.gz-54974-1242745028067' from ftp server May 19, 2009 4:57:08 PM dk.netarkivet.common.distribute.FTPRemoteFile logOn
FINE: Logged onto ftp://netarchive:**********@wc05:21
May 19, 2009 4:57:08 PM dk.netarkivet.common.distribute.FTPRemoteFile
cleanup
FINE: Deleting file 'segments.gz-54974-1242745028067' from ftp server May 19, 2009 4:57:08 PM dk.netarkivet.common.distribute.FTPRemoteFile logOn
FINE: Logged onto ftp://netarchive:**********@wc05:21
May 19, 2009 4:57:08 PM dk.netarkivet.archive.indexserver.FileBasedCache
cache
FINE: release lock on filechannel sun.nio.ch.FileChannelImpl at 1ef7de4
May 19, 2009 4:57:08 PM dk.netarkivet.archive.indexserver.FileBasedCache
getIndex
INFO: Generated index
'/home/netarchive/data/netarchivesuite/cache/DEDUP_CRAWL_LOG/82-cache'
of id '[82]', request was for '[82]'
May 19, 2009 4:57:08 PM dk.netarkivet.harvester.harvesting.HeritrixFiles
setIndexDir
FINE: Setting deduplication index dir
'/home/netarchive/data/netarchivesuite/cache/DEDUP_CRAWL_LOG/82-cache'
May 19, 2009 4:57:08 PM
dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer$HarvesterThread
run
INFO: Starting crawl of job : 83
May 19, 2009 4:57:08 PM dk.netarkivet.harvester.harvesting.HeritrixFiles
writeOrderXml
FINE: Writing order-file to disk as file:
/home/netarchive/apps/netarchivesuite/ONB/8803harvester/83_1242745027376/order.xml
May 19, 2009 4:57:08 PM
dk.netarkivet.harvester.harvesting.JMXHeritrixController <init>
INFO: Starting Heritrix for job 83 of harvest 27 in
8803harvester/83_1242745027376
May 19, 2009 4:57:08 PM
dk.netarkivet.harvester.harvesting.JMXHeritrixController getJMXAdminName
FINE: The JMX username used for connecting to the Heritrix GUI is: 'admin'.
May 19, 2009 4:57:09 PM dk.netarkivet.common.utils.JMXUtils executeCommand
FINE: Preparing to execute completedJobs with args [] on org.archive.crawler:name=Heritrix,type=CrawlService,jmxport=7003,guiport=8803,host=webcrawler06.onb.ac.at
May 19, 2009 4:59:20 PM
dk.netarkivet.harvester.harvesting.JMXHeritrixController getJMXAdminName
FINE: The JMX username used for connecting to the Heritrix GUI is: 'admin'.
May 19, 2009 4:59:20 PM dk.netarkivet.common.utils.JMXUtils executeCommand
FINE: Preparing to execute shutdown with args [] on org.archive.crawler:name=Heritrix,type=CrawlService,jmxport=7003,guiport=8803,host=webcrawler06.onb.ac.at
May 19, 2009 5:01:31 PM
dk.netarkivet.harvester.harvesting.JMXHeritrixController cleanup
SEVERE: JMX error while cleaning up Heritrix controller
dk.netarkivet.common.exceptions.IOFailure: Failed to find MBean org.archive.crawler:name=Heritrix,type=CrawlService,jmxport=7003,guiport=8803,host=webcrawler06.onb.ac.at
for executing shutdown after 17 attempts
at
dk.netarkivet.common.utils.JMXUtils.executeCommand(JMXUtils.java:262)
at
dk.netarkivet.common.utils.JMXUtils.executeCommand(JMXUtils.java:426)
at
dk.netarkivet.harvester.harvesting.JMXHeritrixController.executeHeritrixCommand(JMXHeritrixController.java:852)
at
dk.netarkivet.harvester.harvesting.JMXHeritrixController.cleanup(JMXHeritrixController.java:505)
at
dk.netarkivet.harvester.harvesting.HeritrixLauncher.doCrawl(HeritrixLauncher.java:200)
at
dk.netarkivet.harvester.harvesting.HarvestController.runHarvest(HarvestController.java:221)
at
dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer$HarvesterThread.run(HarvestControllerServer.java:650)
Caused by: javax.management.InstanceNotFoundException:
org.archive.crawler:name=Heritrix,type=CrawlService,jmxport=7003,guiport=8803,host=webcrawler06.onb.ac.at
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getClassLoaderFor(DefaultMBeanServerInterceptor.java:1438)
at
com.sun.jmx.mbeanserver.JmxMBeanServer.getClassLoaderFor(JmxMBeanServer.java:1276)
at
com.sun.jmx.remote.security.MBeanServerAccessController.getClassLoaderFor(MBeanServerAccessController.java:313)
at
javax.management.remote.rmi.RMIConnectionImpl$5.run(RMIConnectionImpl.java:1325)
at java.security.AccessController.doPrivileged(Native Method)
at
javax.management.remote.rmi.RMIConnectionImpl.getClassLoaderFor(RMIConnectionImpl.java:1322)
at
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:771)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)
at
sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:255)
at
sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:233)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:142)
at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
at
javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source)
at
javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:978)
at
dk.netarkivet.common.utils.JMXUtils.executeCommand(JMXUtils.java:243)
... 6 more
Could you please have another look on my settings. Maybe you can find another mistake!
Thx
a.
More information about the NetarchiveSuite-users
mailing list