[Netarchivesuite-users] Problems running jobs with NetarchiveSuite Version: 4.4.0
Meelis Mihhailov
meelis at elfluido.ee
Wed Oct 29 10:01:26 CET 2014
Hi all!
Some days ago I decided to finally install NetarchiveSuite Version:
4.4.0 in our server.
It was not quite as easy but finally managed to get the system up and
running. Problem is that I cannot run any jobs.
So what did I do?
1. Installed NetarchiveSuite Version: 4.4.0 with Quick Start setup
running on PostgreSQL database
2. Updated databases before starting to use (including the tsindex thing)
3. Uploaded default configuration to the database
4. Started with startall script.
Can access web interface, can add configurations and can see system
state that declares "all ok" (No errors) :)
Now ... time to create a job.
1. Create harvest definition
2. Add seeds
3. Add date to execute harvest (current date and time)
4. Save
5. Activate
What happens? Well ... nothing. In the
HarvestJobManagerApplication0.log.0 file I can see that the job has been
added to the database:
*****************
Oct 28, 2014 9:00:10 PM
dk.netarkivet.harvester.datamodel.HarvestDefinitionDBDAO read
FINE: Partialharvest found w/ id 1
Oct 28, 2014 9:00:10 PM dk.netarkivet.harvester.datamodel.ScheduleDBDAO read
FINE: Creating frequency for (timeunit,anytime,numtimeunits,hour,
minute, dayofweek,dayofmonth) = (4, true,12,null,null,null,null,)
Oct 28, 2014 9:00:10 PM dk.netarkivet.harvester.datamodel.Frequency
getNewInstance
FINE: Creating a MONTHLY frequency.
Oct 28, 2014 9:00:10 PM dk.netarkivet.harvester.datamodel.ScheduleDBDAO read
FINE: Creating frequency for (timeunit,anytime,numtimeunits,hour,
minute, dayofweek,dayofmonth) = (4, true,12,null,null,null,null,)
Oct 28, 2014 9:00:10 PM dk.netarkivet.harvester.datamodel.Frequency
getNewInstance
FINE: Creating a MONTHLY frequency.
Oct 28, 2014 9:00:10 PM
dk.netarkivet.harvester.scheduler.HarvestJobGenerator$JobGeneratorTask
generateJobs
INFO: Starting to create jobs for harvest definition #1(MEELISEOMA)
Oct 28, 2014 9:00:10 PM
dk.netarkivet.harvester.scheduler.jobgen.AbstractJobGenerator generateJobs
INFO: Generating jobs for harvestdefinition # 1
Oct 28, 2014 9:00:10 PM
dk.netarkivet.harvester.scheduler.jobgen.DefaultJobGenerator
processDomainConfigurationSubset
FINE: Adding domainconfigs with the same order.xml for harvest # 1
Oct 28, 2014 9:00:10 PM
dk.netarkivet.harvester.scheduler.jobgen.AbstractJobGenerator getNewJob
INFO: No channel mapping registered for harvest id 1, will use default.
Oct 28, 2014 9:00:11 PM
dk.netarkivet.harvester.datamodel.HeritrixTemplate
editOrderXML_ArchiveFormat
FINE: WARC format selected to be used by Heritrix
Oct 28, 2014 9:00:11 PM dk.netarkivet.harvester.datamodel.JobDBDAO create
FINE: Creating Job 1 (state = NEW, HD = 1, channel = FOCUSED, snapshot =
false, forcemaxcount = -1, forcemaxbytes = 1000000000,
forcemaxrunningtime = 0, orderxml = default_orderxml, numconfigs = 1,
created = Tue Oct 28 21:00:11 EET 2014)
Oct 28, 2014 9:00:11 PM dk.netarkivet.harvester.datamodel.Job
getHarvestFilenamePrefix
WARNING: HarvestnamePrefix not yet set for job 1. Set it by using the
naming scheme. This should only happen for old jobs being read
Oct 28, 2014 9:00:11 PM dk.netarkivet.harvester.datamodel.Job
setDefaultHarvestNamePrefix
FINE: Applying the default ArchiveFileNaming class
'dk.netarkivet.harvester.harvesting.LegacyNamingConvention'.
Oct 28, 2014 9:00:11 PM dk.netarkivet.harvester.datamodel.Job
setDefaultHarvestNamePrefix
FINE: The harvestPrefix of this job is: 1-1
Oct 28, 2014 9:00:11 PM
dk.netarkivet.harvester.scheduler.jobgen.DefaultJobGenerator
processDomainConfigurationSubset
FINE: Created # 1 jobs for harvest # 1
Oct 28, 2014 9:00:11 PM
dk.netarkivet.harvester.scheduler.jobgen.AbstractJobGenerator generateJobs
INFO: Finished generating 1 jobs for harvestdefinition # 1
Oct 28, 2014 9:00:11 PM
dk.netarkivet.harvester.scheduler.HarvestJobGenerator$JobGeneratorTask$1 run
INFO: Created 1 jobs for harvest definition (MEELISEOMA)
Oct 28, 2014 9:00:11 PM
dk.netarkivet.harvester.datamodel.HarvestDefinitionDBDAO update
FINE: 1 partialharvests records updated
Oct 28, 2014 9:00:11 PM
dk.netarkivet.harvester.scheduler.HarvestJobGenerator$JobGeneratorTask$1 run
FINE: Removed HD #1(MEELISEOMA) from list of harvestdefinitions to be
scheduled. Harvestdefinitions still to be scheduled: []
*****************
and I can see it in the harvest status as "new" but that's it.
Job ID: 1
Harvest name: MEELISEOMA
Run number: 0
Start time: -
End time: -
Status: New
Harvest errors: -
Upload errors: -
Number of configurations: 1
It wont go forward and harvesting does not start. I can't find any
errors or warnings (except the "HarvestnamePrefix not yet set for job
1." thing in the HarvestJobManagerApplication0.log.0 file).
In the system overview I have several different statuses for the
HarvestControllerServer (We use 20 controller applications)
Oct 29, 2014 10:43:02 AM dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer
<init>
INFO: Requested to check the validity of harvest channel 'FOCUSED'
Oct 29, 2014 10:42:59 AM dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer
close
INFO: Closing HarvestControllerServer.
And one of my favorites:
Oct 29, 2014 10:42:59 AM dk.netarkivet.harvester.harvesting.distribute.HarvestControllerServer
visit
SEVERE: Received message stating that channel 'FOCUSED' is invalid. Will
stop.
Tried to run it with system user "crawler" and for the last attempt :
root. Both have the same result.
To be honest ... I have no clue what to do next and hope you can help me
to find the way to go forward with version 4.4.0 :)
Meelis Mihhailov
National Library Of Estonia
meelis at nlib.ee
More information about the NetarchiveSuite-users
mailing list