[Netarchivesuite-devel] Reconnect NAS to running jobs

aponb at gmx.at aponb at gmx.at
Wed Apr 11 13:54:42 CEST 2018


Thanks Nicolas for your help. This brought me to check the status of the 
job in NAS which was really failed. The reason was "Job xxx has exceeded 
its timeout of 10080 minutes. Changing status to FAILED. ". Setting back 
the job status to 2 (Started) for that job in the database, brings back 
the complete Details and Actions Page of this Running job!

> And not if you look directory in the job definition.
>
> Sometimes the job is set to failed but is showing as running in the 
> running jobs pages.
>
> Anyway if the the job is not set to failed yet it should be a bit 
> easier to re-register it.
>
> I'm guessing that progress reports for jobs that are not "known" to be 
> running are silent ignored.
>
> And the running jobs page "only" shows what is in the progress reports 
> table.
>
> So somehow from progress report it should be possible to re-register 
> jobs which appear to still be running.
>
> This could maybe be done fairly easily. Someone would need to look in 
> the code to see if progress reports are just ignored for jobs that the 
> GUI does not think are running.
>
>
> Best
>
> Nicholas
>
> ------------------------------------------------------------------------
> *From:* Netarchivesuite-devel 
> <netarchivesuite-devel-bounces at ml.sbforge.org> on behalf of 
> aponb at gmx.at <aponb at gmx.at>
> *Sent:* Wednesday, April 11, 2018 11:14:45 AM
> *To:* netarchivesuite-devel at ml.sbforge.org
> *Subject:* Re: [Netarchivesuite-devel] Reconnect NAS to running jobs
> Hi Nicholas,
>
> thanks for your quick response!
>
> The job was not set to failed after restarting NAS. Actually the ob is 
> running and also appears in the running jobs overview, but it is not 
> showing the correct progress and queued files and so on.
>
> Yeah, that would be nice if NAS would check for running jobs and could 
> reconnect to it. Doesn't need to be automatically. A manual step would 
> be ok. But I understand that it seems to be a lot of work to redesign 
> the current behavior.
>
> Regards
> a.
>
>
>
> On 2018-04-11 10:38, Nicholas Grooss Clarke wrote:
>>
>> Hi Andreas
>>
>>
>> When you did a restart did the job get set to failed and also does 
>> not appear in the running jobs overview?
>>
>>
>> Basically you would like the GUI to check for running jobs when it is 
>> restarted?
>>
>>
>> One problem with this is that the harvestcontrollers do not listen to 
>> JMS when busy.
>>
>> The H3 monitor does not yet have a list of all hosts/ports where H3 
>> instances are configured to run.
>>
>> So currently the H3 monitor only read the state of jobs from the 
>> database to see want to monitor.
>>
>>
>> I think Søren changed the GUI code at some point so 
>> HarvestControllers reconnected with the GUI when it gets a message 
>> from an "unknown" source.
>>
>>
>> I have no idea how difficult it would be to change the 
>> GUI/Jobscheduler to identify running job during a restart.
>>
>> I would need to have a grace period of 5-10 minuttes for progress 
>> reports to "magically" appear to know an orphaned job needs to be 
>> registered.
>>
>>
>> Redesigning the jobscheduler would make this functionality 
>> straightforward. If/when that happens.
>>
>> Theoretically it should also be possible to restart the 
>> HarvestController and reconnect to the H3 job.
>>
>> But not with the existing design.
>>
>>
>> Best
>>
>> Nicholas
>>
>> ------------------------------------------------------------------------
>> *From:* Netarchivesuite-devel 
>> <netarchivesuite-devel-bounces at ml.sbforge.org> on behalf of 
>> aponb at gmx.at <aponb at gmx.at>
>> *Sent:* Wednesday, April 11, 2018 10:04:50 AM
>> *To:* netarchivesuite-devel at ml.sbforge.org
>> *Subject:* [Netarchivesuite-devel] Reconnect NAS to running jobs
>>
>> I had to restart NAS (5.4) without jmsbroker , just the running 
>> Heritrix /HarvestControllerApplications kept alive. After restarting 
>> NAS I am not getting the services of "Details and Actions on Running 
>> Job"-Page on http://nasurl/History/history/job/xxx/
>>
>> I am only receiving
>>
>>
>>       Details and Actions on Running Job xxx
>>
>> NAS Job xxx is in state FAILED.
>>
>> I only can access the job directly via the heritrix3 WebConsole.
>>
>> Is there any chance to reconnect the NAS-App with this running job?
>>
>> Regards
>> a.
>>
>>
>>
>> _______________________________________________
>> Netarchivesuite-devel mailing list
>> Netarchivesuite-devel at ml.sbforge.org
>> https://ml.sbforge.org/mailman/listinfo/netarchivesuite-devel
>
>
>
>
> _______________________________________________
> Netarchivesuite-devel mailing list
> Netarchivesuite-devel at ml.sbforge.org
> https://ml.sbforge.org/mailman/listinfo/netarchivesuite-devel


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-devel/attachments/20180411/bff382c7/attachment.html>


More information about the Netarchivesuite-devel mailing list