[Netarchivesuite-devel] Problem running new Batch Job

Asger Blekinge-Rasmussen abr at statsbiblioteket.dk
Wed Jul 8 10:46:41 CEST 2009


Hi

I am not really part of this bug, but here goes my input.

I am sure this is a case of one hand not knowning what the other hand
does, aka scoping.

For some reason you are able to load the UURIFactory in one classloader,
but when invoking the UURIFactory, the classloading is delegated to the
classloader for UURIFactory. And this classloader does not know about
the weird class'es in the custom classloader.

KFC talked about replicating this bug as a Unit test. I would like the
see the result of this.

Regards



On Wed, 2009-07-08 at 10:30 +0200, Colin Samuel Rosenthal wrote:
> Asger and I spent some time looking at this yesterday. We're not much wiser, but we're stupider in a more interesting way :-)
> 
> We replaced the body of the initialise() method with:
> 
> Class<?> urifactory = this.getClass().getClassLoader().loadClass(
>                     "org.archive.net.UURIFactory");            
>             Method method = urifactory.getMethod("getInstance", String.class);
>             Object o = method.invoke(null, "http://foo.bar");
> 
> (surrounded by a try/catch which rethrows the checked exceptions as RuntimeExceptions).
> 
> The interesting thing is that the NoClassDefFoundError is thrown by the last line - the method invocation. So
> the class is loaded, but cannot be linked/instantiated/initialised.
> 
> I also tried printing the toString() of urifactory.getClassLoader() to see what actual ClassLoader was invoked but this produced a security exception.
> 
> The java ClassLoader delegation model means that the loadClass() method will first try to load the class from the parent of the current class-loader. So
> urifactory is presumably loaded from the system class-loader, not the LoadableJarBatch loader. In fact it loads even if we remove heritrix.jar from
> the RunBatch script. However that means that any classes loaded from UURIFactory will not have access to the LoadableJar loader. I don't see why
> that should matter, unless we are missing a dependency in the bitarchive application. Is that possible? If so, then the problem has nothing to do 
> with LoadableJarBatchJob and should be reproducible in any batch job.
> 
> --
> Colin
> ________________________________________
> From: Søren Vejrup Carlsen [svc at kb.dk]
> Sent: Tuesday, July 07, 2009 2:37 PM
> To: Colin Samuel Rosenthal; netarchivesuite-devel at lists.gforge.statsbiblioteket.dk
> Subject: SV: [Netarchivesuite-devel] Problem running new Batch Job
> 
> Hi Colin.
> It would seem, that there is a bug in the LoadableJarBatchJob class.
> To determine if this is the case, we need to add more logging in this class, specifically in the
> LoadableJarBatchJob.ByteJarLoader#findClass() method to see, why it fails to look up the UURIFactory class.
> 
> /Søren
> -----Oprindelig meddelelse-----
> Fra: netarchivesuite-devel-bounces at lists.gforge.statsbiblioteket.dk [mailto:netarchivesuite-devel-bounces at lists.gforge.statsbiblioteket.dk] På vegne af Colin Rosenthal
> Sendt: 7. juli 2009 10:41
> Til: netarchivesuite-devel at lists.gforge.statsbiblioteket.dk
> Emne: [Netarchivesuite-devel] Problem running new Batch Job
> 
> Hi,
> 
> I'm testing a new ArcBatchJob class and I don't understand why I'm
> getting a NoClassDefFoundError for org.archive.net.UURIFactory. I've
> tried both with and without specifically adding the heritrix jar as
> batch dependency but it doesn't seem to make any difference. Here is
> my command line and the output:
> 
> java -Ddk.netarkivet.settings.file=../settings_wayback_8080.xml
> -Dsettings.common.applicationInstanceId=CDXIndexer8080  -cp
> lib/dk.netarkivet.archive.jar   dk.netarkivet.archive.tools.RunBatch
> -Ndk.netarkivet.wayback.ExtractWaybackCDXBatchJob  -R'.*'
> -J/home/netarkiv/csr/batch/lib/dk.netarkivet.wayback.jar,/home/netarkiv/csr/batch/lib/wayback-core-1.4.0.jar,/home/netarkiv/csr/batch/lib/heritrix/lib/heritrix-1.14.3.jar
> Jul 7, 2009 10:35:12 AM
> dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClient <init>
> INFO: JMSArcRepositoryClient will retry a store 3 times and timeout on
> each try after 3600000 milliseconds, and timeout on each getrequest
> after 60000 milliseconds.
> Jul 7, 2009 10:35:12 AM
> dk.netarkivet.common.distribute.JMSConnectionSunMQ <init>
> INFO: Creating instance of
> dk.netarkivet.common.distribute.JMSConnectionSunMQ
> Jul 7, 2009 10:35:12 AM dk.netarkivet.common.distribute.JMSConnection
> initConnection
> INFO: Initializing a JMS connection of type 'class
> dk.netarkivet.common.distribute.JMSConnectionSunMQ' to Broker at
> kb-test-adm-001.kb.dk:7676.
> Jul 7, 2009 10:35:13 AM
> dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClient <init>
> INFO: JMSArcRepository listens for replies on channel '[Queue
> 'PLIGT_COMMON_THIS_REPOS_CLIENT_130_225_27_142_NA_CDXINDEXER8080']'
> Jul 7, 2009 10:35:13 AM
> dk.netarkivet.common.utils.batch.LoadableJarBatchJob <init>
> INFO: Loading loadableJarBatchJob using jarfiles:
> dk.netarkivet.wayback.jarwayback-core-1.4.0.jarheritrix-1.14.3.jar and
> jobclass 'dk.netarkivet.wayback.ExtractWaybackCDXBatchJob
> Running batch job 'dk.netarkivet.wayback.ExtractWaybackCDXBatchJob from
> jar-file
> '/home/netarkiv/csr/batch/lib/dk.netarkivet.wayback.jar,/home/netarkiv/csr/batch/lib/wayback-core-1.4.0.jar,/home/netarkiv/csr/batch/lib/heritrix/lib/heritrix-1.14.3.jar'
> on files matching '.*' on replica 'SBN', output written to stdout errors
> written to stderr
> Jul 7, 2009 10:35:15 AM
> dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClient batch
> WARNING: The batch job
> 'ID:15-130.225.27.142(ff:40:86:d7:73:db)-55674-1246955713422: To
> PLIGT_COMMON_THE_REPOS ReplyTo
> PLIGT_COMMON_THIS_REPOS_CLIENT_130_225_27_142_NA_CDXINDEXER8080 OK Job:
> dk.netarkivet.common.utils.batch.LoadableJarBatchJob processing
> dk.netarkivet.wayback.ExtractWaybackCDXBatchJob from
> dk.netarkivet.common.utils.batch.LoadableJarBatchJob$ByteJarLoader at ba6c83'
> resulted in the following error: java.lang.NoClassDefFoundError: Could
> not initialize class org.archive.net.UURIFactory
> java.lang.NoClassDefFoundError: Could not initialize class
> org.archive.net.UURIFactory
>     at
> org.archive.wayback.util.url.AggressiveUrlCanonicalizer.urlStringToKey(AggressiveUrlCanonicalizer.java:234)
>     at
> org.archive.wayback.resourcestore.indexer.ARCRecordToSearchResultAdapter.adaptInner(ARCRecordToSearchResultAdapter.java:140)
>     at
> org.archive.wayback.resourcestore.indexer.ARCRecordToSearchResultAdapter.adapt(ARCRecordToSearchResultAdapter.java:65)
>     at
> dk.netarkivet.wayback.ExtractWaybackCDXBatchJob.processRecord(ExtractWaybackCDXBatchJob.java:64)
>     at
> dk.netarkivet.common.utils.arc.ARCBatchJob.processFile(ARCBatchJob.java:142)
>     at
> dk.netarkivet.common.utils.batch.LoadableJarBatchJob.processFile(LoadableJarBatchJob.java:213)
>     at
> dk.netarkivet.common.utils.batch.BatchLocalFiles.processFile(BatchLocalFiles.java:115)
>     at
> dk.netarkivet.common.utils.batch.BatchLocalFiles.run(BatchLocalFiles.java:82)
>     at
> dk.netarkivet.archive.bitarchive.Bitarchive.batch(Bitarchive.java:240)
>     at
> dk.netarkivet.archive.bitarchive.distribute.BitarchiveServer$1.run(BitarchiveServer.java:396)
> 
> Processed 0 files with 0 failures
> Cleaning up dk.netarkivet.common.distribute.JMSConnectionSunMQ
> Cleaned up dk.netarkivet.common.distribute.JMSConnectionSunMQ
> 
> --
> Colin
> _______________________________________________
> Netarchivesuite-devel mailing list
> Netarchivesuite-devel at lists.gforge.statsbiblioteket.dk
> https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchivesuite-devel




More information about the Netarchivesuite-devel mailing list