[Netarchivesuite-users] Problems running jobs with NetarchiveSuite Version: 4.4.0

Meelis Mihhailov meelis at elfluido.ee
Wed Oct 29 10:01:26 CET 2014


Hi all!

Some days ago I decided to finally install NetarchiveSuite Version: 
4.4.0 in our server.
It was not quite as easy but finally managed to get the system up and 
running. Problem is that I cannot run any jobs.

So what did I do?

1. Installed NetarchiveSuite Version: 4.4.0 with Quick Start setup 
running on PostgreSQL database
2. Updated databases before starting to use (including the tsindex thing)
3. Uploaded default configuration to the database
4. Started with startall script.

Can access web interface, can add configurations and can see system 
state that declares "all ok" (No errors) :)

Now ... time to create a job.

1. Create harvest definition
2. Add seeds
3. Add date to execute harvest (current date and time)
4. Save
5. Activate

What happens? Well ... nothing. In the 
HarvestJobManagerApplication0.log.0 file I can see that the job has been 
added to the database:

*****************
Oct 28, 2014 9:00:10 PM 
dk.netarkivet.harvester.datamodel.HarvestDefinitionDBDAO read
FINE: Partialharvest found w/ id 1
Oct 28, 2014 9:00:10 PM dk.netarkivet.harvester.datamodel.ScheduleDBDAO read
FINE: Creating frequency for (timeunit,anytime,numtimeunits,hour, 
minute, dayofweek,dayofmonth) = (4, true,12,null,null,null,null,)
Oct 28, 2014 9:00:10 PM dk.netarkivet.harvester.datamodel.Frequency 
getNewInstance
FINE: Creating a MONTHLY frequency.
Oct 28, 2014 9:00:10 PM dk.netarkivet.harvester.datamodel.ScheduleDBDAO read
FINE: Creating frequency for (timeunit,anytime,numtimeunits,hour, 
minute, dayofweek,dayofmonth) = (4, true,12,null,null,null,null,)
Oct 28, 2014 9:00:10 PM dk.netarkivet.harvester.datamodel.Frequency 
getNewInstance
FINE: Creating a MONTHLY frequency.
Oct 28, 2014 9:00:10 PM 
dk.netarkivet.harvester.scheduler.HarvestJobGenerator$JobGeneratorTask 
generateJobs
INFO: Starting to create jobs for harvest definition #1(MEELISEOMA)
Oct 28, 2014 9:00:10 PM 
dk.netarkivet.harvester.scheduler.jobgen.AbstractJobGenerator generateJobs
INFO: Generating jobs for harvestdefinition # 1
Oct 28, 2014 9:00:10 PM 
dk.netarkivet.harvester.scheduler.jobgen.DefaultJobGenerator 
processDomainConfigurationSubset
FINE: Adding domainconfigs with the same order.xml for harvest # 1
Oct 28, 2014 9:00:10 PM 
dk.netarkivet.harvester.scheduler.jobgen.AbstractJobGenerator getNewJob
INFO: No channel mapping registered for harvest id 1, will use default.
Oct 28, 2014 9:00:11 PM 
dk.netarkivet.harvester.datamodel.HeritrixTemplate 
editOrderXML_ArchiveFormat
FINE: WARC format selected to be used by Heritrix
Oct 28, 2014 9:00:11 PM dk.netarkivet.harvester.datamodel.JobDBDAO create
FINE: Creating Job 1 (state = NEW, HD = 1, channel = FOCUSED, snapshot = 
false, forcemaxcount = -1, forcemaxbytes = 1000000000, 
forcemaxrunningtime = 0, orderxml = default_orderxml, numconfigs = 1, 
created = Tue Oct 28 21:00:11 EET 2014)
Oct 28, 2014 9:00:11 PM dk.netarkivet.harvester.datamodel.Job 
getHarvestFilenamePrefix
WARNING: HarvestnamePrefix not yet set for job 1. Set it by using the 
naming scheme. This should only happen for old jobs being read
Oct 28, 2014 9:00:11 PM dk.netarkivet.harvester.datamodel.Job 
setDefaultHarvestNamePrefix
FINE: Applying the default ArchiveFileNaming class 
'dk.netarkivet.harvester.harvesting.LegacyNamingConvention'.
Oct 28, 2014 9:00:11 PM dk.netarkivet.harvester.datamodel.Job 
setDefaultHarvestNamePrefix
FINE: The harvestPrefix of this job is: 1-1
Oct 28, 2014 9:00:11 PM 
dk.netarkivet.harvester.scheduler.jobgen.DefaultJobGenerator 
processDomainConfigurationSubset
FINE: Created # 1 jobs for harvest # 1
Oct 28, 2014 9:00:11 PM 
dk.netarkivet.harvester.scheduler.jobgen.AbstractJobGenerator generateJobs
INFO: Finished generating 1 jobs for harvestdefinition # 1
Oct 28, 2014 9:00:11 PM 
dk.netarkivet.harvester.scheduler.HarvestJobGenerator$JobGeneratorTask$1 run
INFO: Created 1 jobs for harvest definition (MEELISEOMA)
Oct 28, 2014 9:00:11 PM 
dk.netarkivet.harvester.datamodel.HarvestDefinitionDBDAO update
FINE: 1 partialharvests records updated
Oct 28, 2014 9:00:11 PM 
dk.netarkivet.harvester.scheduler.HarvestJobGenerator$JobGeneratorTask$1 run
FINE: Removed HD #1(MEELISEOMA) from list of harvestdefinitions to be 
scheduled. Harvestdefinitions still to be scheduled: []

*****************

and I can see it in the harvest status as "new" but that's it.

Job ID: 1
Harvest name: MEELISEOMA
Run number: 0
Start time: -
End time: -
Status: New
Harvest errors: -
Upload errors: -
Number of configurations: 1

It wont go forward and harvesting does not start. I can't find any 
errors or warnings (except the "HarvestnamePrefix not yet set for job 
1." thing in the HarvestJobManagerApplication0.log.0 file).

In the system overview I have several different statuses for the 
HarvestControllerServer (We use 20 controller applications)

Oct 29, 2014 10:43:02 AM dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer
<init>
INFO: Requested to check the validity of harvest channel 'FOCUSED'

Oct 29, 2014 10:42:59 AM dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer
close
INFO: Closing HarvestControllerServer.


And one of my favorites:

Oct 29, 2014 10:42:59 AM dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer
visit
SEVERE: Received message stating that channel 'FOCUSED' is invalid. Will
stop.


Tried to run it with system user "crawler" and for the last attempt : 
root. Both have the same result.

To be honest ... I have no clue what to do next and hope you can help me 
to find the way to go forward with version 4.4.0 :)



Meelis Mihhailov
National Library Of Estonia
meelis at nlib.ee



More information about the NetarchiveSuite-users mailing list