[Netarchivesuite-users] Nothing happens after starting generating dedupcrawllogindex

aponb at gmx.at aponb at gmx.at
Tue May 26 17:11:29 CEST 2009


>
> Hi Andreas
>
> I have found two inconsistencies in your configuration file:
>
> The 'settings.notification' branch in your settings at deployGlobal should be placed under 'settings.common.notification'.
>
> The 'settings.harvester.datamodel.defaultMaxbytes' in the settings for machine 'wc06' should be 'settings.harvester.datamodel.domain.defaultMaxbytes'.
>
>
> It is very unlikely that the above inconsistencies are causing the problem.
> More likely there is something wrong with how Heritrix is started, and there could be something in the Heritrix logs, which could indicate the problem is.
>
> Best regards
> Jonas and Søren.

I corrected the wrong settings and you were right that they didn't cause 
the problem.
And it is also correct, that there is something wrong with calling 
Heritrix. The heritrix_dmesg.log shows a NullPointer Exception:
java.lang.NullPointerException
        at 
org.archive.crawler.admin.CrawlJobHandler.loadJobs(CrawlJobHandler.java:251)
        at 
org.archive.crawler.admin.CrawlJobHandler.<init>(CrawlJobHandler.java:221)
        at 
org.archive.crawler.admin.CrawlJobHandler.<init>(CrawlJobHandler.java:187)
        at org.archive.crawler.Heritrix.<init>(Heritrix.java:405)
        at org.archive.crawler.Heritrix.<init>(Heritrix.java:393)
        at org.archive.crawler.Heritrix.doCmdLineArgs(Heritrix.java:718)
        at org.archive.crawler.Heritrix.main(Heritrix.java:556)

It seems that the state.job file is not available.
All that is happening because I am trying to use Heritrix 1.14.3 with 
deduplicator 0.4 with the NAS 3.8
Is there anything I have to do beside replacing the heritrix.jars and 
the deduplicator.jar?

Regards
a.




More information about the NetarchiveSuite-users mailing list