[Netarchivesuite-users] Nothing happens after starting generating dedupcrawllogindex

Søren Vejrup Carlsen svc at kb.dk
Tue May 26 17:37:44 CEST 2009


Hi Andreas.
The Heritrix 1.14.3 and deduplicator 0.4.0 is part of NetarchiveSuite 3.8.0, which has just now been released.

So you copuld try using the 3.8.0 distribution as-is instead of your own modifications, and see if the heritrix error remains.

Note that the deduplicator 0.4.0.jar bundled with NAS 3.8 has been compiled with the heritrix-1.14.3.jar instead of heritrix-1.10 that is bundled with the src of deduplicator-0.4.0.
This could generate strange errors.

/Søren

-----Oprindelig meddelelse-----
Fra: netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk [mailto:netarchivesuite-users-bounces at lists.gforge.statsbiblioteket.dk] På vegne af aponb at gmx.at
Sendt: 26. maj 2009 17:11
Til: netarchivesuite-users at lists.gforge.statsbiblioteket.dk
Emne: [Netarchivesuite-users] Nothing happens after starting generating dedupcrawllogindex

>
> Hi Andreas
>
> I have found two inconsistencies in your configuration file:
>
> The 'settings.notification' branch in your settings at deployGlobal should be placed under 'settings.common.notification'.
>
> The 'settings.harvester.datamodel.defaultMaxbytes' in the settings for machine 'wc06' should be 'settings.harvester.datamodel.domain.defaultMaxbytes'.
>
>
> It is very unlikely that the above inconsistencies are causing the problem.
> More likely there is something wrong with how Heritrix is started, and there could be something in the Heritrix logs, which could indicate the problem is.
>
> Best regards
> Jonas and Søren.

I corrected the wrong settings and you were right that they didn't cause 
the problem.
And it is also correct, that there is something wrong with calling 
Heritrix. The heritrix_dmesg.log shows a NullPointer Exception:
java.lang.NullPointerException
        at 
org.archive.crawler.admin.CrawlJobHandler.loadJobs(CrawlJobHandler.java:251)
        at 
org.archive.crawler.admin.CrawlJobHandler.<init>(CrawlJobHandler.java:221)
        at 
org.archive.crawler.admin.CrawlJobHandler.<init>(CrawlJobHandler.java:187)
        at org.archive.crawler.Heritrix.<init>(Heritrix.java:405)
        at org.archive.crawler.Heritrix.<init>(Heritrix.java:393)
        at org.archive.crawler.Heritrix.doCmdLineArgs(Heritrix.java:718)
        at org.archive.crawler.Heritrix.main(Heritrix.java:556)

It seems that the state.job file is not available.
All that is happening because I am trying to use Heritrix 1.14.3 with 
deduplicator 0.4 with the NAS 3.8
Is there anything I have to do beside replacing the heritrix.jars and 
the deduplicator.jar?

Regards
a.

_______________________________________________
NetarchiveSuite-users mailing list
NetarchiveSuite-users at lists.gforge.statsbiblioteket.dk
https://lists.gforge.statsbiblioteket.dk/mailman/listinfo/netarchivesuite-users




More information about the NetarchiveSuite-users mailing list