[Netarchivesuite-users] Job already finished, but still started in NAS
aponb at gmx.at
aponb at gmx.at
Thu Oct 8 12:32:08 CEST 2009
>
>> > But the server did get a new Job to assigned and did start to crawl. I
>> > corrected the hosts file to the correct value. I was thinking after
>> > finishing that job, the application will reconnect in the right way. But
>> > that did not happen. The job finished crawling and all files were
>> > uploaded to the storage, but the NAS did get any information about that.
>>
>
> That sounds very strange, would you mind double checking that this is
> actually the case?
>
I checked it again and the job is now finished with all statistical
information.
It probably took a long time to generate it - so I was too quick with my
message, sorry about that.
> JMX is in no way involved in NetarchiveSuite getting information about
> the crawl, that is purely handled by the exchange of JMS messages. Since
> the files could be uploaded, JMS seems to work. So I would imagine that
> the scheduler also got the message about the job being crawled.
>
> You can check this by looking at the history for the harvest definition
> that generated the crawled job.
>
That job is ok now.
>
>> > But the Harvester on that server did already got a new job assigned,
>> > which is already running. The problem is now that I have one job without
>> > any statistical information and there will be more jobs without stopping
>> > that machine (how can I stop a HarvesterControllerInstance right after
>> > uploading the files from the last job - is there a possibility to stop
>> > assigning jobs to that instance?
>>
>
> Unfortunately not. The best way to stop the machine is to kill it during
> upload if you can manage it.
and then using the Upload Tool for uploading that job. That could be a
good way.
Would be convenient to have the possibility to control out of the Admin
GUI that a certain harvester instance should get further jobs assigned.
I will create a FR for that.
> It should, assuming it is on the same host with the same ports and
> running the same kind of harvester. Otherwise, you might want to give it
> a different JMX port in your settings, that will enable it to be
> monitored as if it was an entirely new application. Although the old
> entry in the Status page will remain there showing a warning the
> application that is no longer running.
>
ok I understand.
> If you really haven't got the statistical data (please double check),
>
it is ok now.
> and all the files were uploaded and deleted from the server, it will
> probably require quite a bit of manual work to get the right data.
>
> You will have to reconstruct the log files in the crawldir from the
> metadata files. Then move that crawldir into the jobs directory of a
> harvester, and restart it.
>
> We should probably create a tool for doing what a harvester does with
> old jobs, to make this sort of recovery possible without restarting the
> harvesters.
>
I was thinking about the same steps. Good to know if that will happen
(after double checking ...)
Thanks for your help
a.
More information about the NetarchiveSuite-users
mailing list