[Netarchivesuite-users] Problems With The IndexServerAplication + WaybackIndexer DB

Mikis Seth Sørensen mss at statsbiblioteket.dk
Fri Dec 12 10:38:30 CET 2014


Hi Charles

The problem you see is that the 1-1-20141210141254-00000-webarchive.upei.ca.warc hasn’t been uploaded from the harvester to the archive. The harvester_high log should contain information about the cause for the failed upload to the archive.

Best
MIkis

From: Charles Tassell <ctassell at gmail.com<mailto:ctassell at gmail.com>>
Reply-To: "netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>" <netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>>
Date: Wednesday, December 10, 2014 at 3:54 PM
To: "netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>" <netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>>
Subject: Re: [Netarchivesuite-users] Problems With The IndexServerAplication + WaybackIndexer DB

Thanks, I fixed that up and ran a job, but after the crawl finished and the .warc was created th next step seemed to die.  the BitarchiveMonitorApplication0.log file says that it can't find the .warc file (although I have confirmed that it's there)  The log says:

10-Dec-2014 11:14:24 AM dk.netarkivet.archive.bitarchive.distribute.BitarchiveMonitorServer
replyToGetChecksumMessage
INFO: Replying GetChecksumMessage: 'ID:1795-137.149.200.20(a0:54:f2:7b:7:3c)-38167-1418224464362:
To ROBLIB_COMMON_THE_REPOS ReplyTo ROBLIB_COMMON_THIS_REPOS_CLIENT_137_149_200_20_GUIWS
Error: dk.netarkivet.common.exceptions.IOFailure: The batchjob did not

find the file '1-1-20141210141254-00000-webarchive.upei.ca.warc' within
the archive.
dk.netarkivet.common.exceptions.IOFailure: The batchjob did not find the
file '1-1-20141210141254-00000-webarchive.upei.ca.warc' within the archive.
    at dk.netarkivet.archive.bitarchive.distribute.BitarchiveMonitorServer.replyToGetChecksumMessage(Bi
tarchiveMonitorServer.java:733)
    at dk.netarkivet.archive.bitarchive.distribute.BitarchiveMonitorServer.replyConvertedBatch(Bitarchi
veMonitorServer.java:641)
    at dk.netarkivet.archive.bitarchive.distribute.BitarchiveMonitorServer.access$200(BitarchiveMonitor
Server.java:81)
    at dk.netarkivet.archive.bitarchive.distribute.BitarchiveMonitorServer$2.run(BitarchiveMonitorServe
r.java:535)
 Arcfiles: 1-1-20141210141254-00000-webarchive.upei.ca.warc, ReplicaId:
A, Checksum: null'.

But the file does exist: ./harvester_high/1_1418220769306/warcs/1-1-20141210141254-00000-webarchive.upei.ca.warc

  Is this another broken path in the deployment file?  Is there a better deployment file that I can use which installs the full suite (harvester, indexer and viewer) that is known to work?


On 14-12-10 10:07 AM, Mikis Seth Sørensen wrote:
Hi Charles

The application classes are defined in the deployment xml file. I can see that in the 'deploy_standalone_example_with_wayback_apps.xml’ the IndexServerApplication namespace is wrong missing the harvester part as you have note (the deploy_standalone_example.xml has the correct setting).

Try changing the line
<applicationNamename="dk.netarkivet.archive.indexserver.IndexServerApplication">
to
<applicationNamename="dk.netarkivet.harvester.indexserver.IndexServerApplication”>
in you deply xml and run the script generation and deployment again.

Best
Mikis

From: Charles Tassell <charles at islandadmin.ca<mailto:charles at islandadmin.ca>>
Reply-To: "netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>" <netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>>
Date: Wednesday, December 10, 2014 at 2:23 PM
To: "netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>" <netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>>
Subject: Re: [Netarchivesuite-users] Problems With The IndexServerAplication + WaybackIndexer DB

Sorry, did some grepping and found the comments in the deployment file for how to create the Wayback database, so that is sorted out.  I'm still wondering about the IndexServerApplication path though.

On 14-12-10 09:06 AM, Charles Tassell wrote:
Hi Guys,

  I'm still having some issues with getting a fresh 4.4.1 install going.  There seem to be two issues left after fixing the queue names in the deployment file.

  First, when I try to start the IndexServerApplication I get the following error message:

Exception in thread "main" java.lang.NoClassDefFoundError: dk/netarkivet/archive/indexserver/IndexServerApplication
Caused by: java.lang.ClassNotFoundException: dk.netarkivet.archive.indexserver.IndexServerApplication
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: dk.netarkivet.archive.indexserver.IndexServerApplication.  Program will exit.

I did some digging, and it looks like the actual class path is dk.netarkivet.harvester.indexserver.IndexServerApplication  Is that correct, or are the harvester and archive IndexServerApplications different classes?

  Secondly, the WaybackIndexer does not seem to be able to connect to the database at port 8124.  It looks like the installer script doesn't create the derby instance for the WaybackIndexer.  Are there any docs on how to do that manually?





_______________________________________________
NetarchiveSuite-users mailing list
NetarchiveSuite-users at ml.sbforge.org<mailto:NetarchiveSuite-users at ml.sbforge.org>http://ml.sbforge.org/mailman/listinfo/netarchivesuite-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20141212/3f53793c/attachment-0001.html>


More information about the NetarchiveSuite-users mailing list