[Netarchivesuite-users] Preparing for and handling planned network outage during broad crawl

Fri Aug 30 13:43:51 CEST 2024

Normally we only have planned network downtime < 15 min. If the  network  down time is unprepared and takes longer the below description is a realistic JMS broker message scenario - at our site

Best regards
Tue

From: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org> On Behalf Of Peter Svanberg
Sent: Friday, August 30, 2024 1:29 PM
To: netarchivesuite-users at ml.sbforge.org
Subject: Re: [Netarchivesuite-users] Preparing for and handling planned network outage during broad crawl

Hello Tue and Sara!

Sara confirms my draft plan.

1.      Pause all ongoing jobs, so far in advance that there is no risk of jobs being in post-processing.

2.      Deactivate/postpone start of selective jobs.

3.      After outage: unpause all paused jobs. (Much clicking … maybe can be scripted.)

But what happens then? The unpaused jobs continue and post-processing works?

And if that works, does the queue handling and connections between the main server and all harvester instances remain so new jobs are created, or must something be restarted?

Tue, did you describe what happens if you do no preparing?

Best regards
Peter

Från: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org<mailto:netarchivesuite-users-bounces at ml.sbforge.org>> För sara.aubry at bnf.fr<mailto:sara.aubry at bnf.fr>
Skickat: den 29 augusti 2024 10:42
Till: netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>
Ämne: Re: [Netarchivesuite-users] Preparing for and handling planned network outage during broad crawl

Hello Peter,

Planned network outage is a lot easier to handle than unexpected ones!

At BnF, we paused all running jobs until we have the confirmation everything is back to normal. We already did this during broad crawls so for more than 60 cralwers.
The challenge is to make sure you have no new job to start, so we temporarily desactivate the Selective harvests.
If we have jobs at a post-processing stage that we cannot paused, we stop the HarvestController by putting a shutdown.txt file in the job directory..

Best,

Sara

De :        "Tue Hejlskov Larsen" <tlr at kb.dk<mailto:tlr at kb.dk>>
A :        "netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>" <netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>>
Date :        29/08/2024 08:42
Objet :        Re: [Netarchivesuite-users] Preparing for and handling planned network outage during broad crawl
Envoyé par :        "NetarchiveSuite-users" <netarchivesuite-users-bounces at ml.sbforge.org<mailto:netarchivesuite-users-bounces at ml.sbforge.org>>
________________________________

Hello Peter,

It depends on your max messages queues settings in the Open JMs broker properties and how long time there is no network. Our installation breaks down if it  loose the network connection more than an 10 – 30 min..dependend on what’a running
If the message queues hIt the maximum there is normally no other way than restart the whole platform and all running harvesterjobs needs to be restarted.
You can try to restart the GUI it will try to empty the JMS queues. And sometimes - after the network is ok again - and you have patience and wait 2-3 hours the broker will tries to resolve the messages queues and succeded to do it  last time.

Best regards
Tue

From: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org<mailto:netarchivesuite-users-bounces at ml.sbforge.org>> On Behalf Of Peter Svanberg
Sent: Wednesday, August 28, 2024 6:01 PM
To: netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>
Subject: [Netarchivesuite-users] Preparing for and handling planned network outage during broad crawl

Hello!

There will be network service work at our site next weekend (7-8 Aug.). No figures on outage but they will change hardware so probably many minutes, maybe hours. Our current broad crawl pass is perhaps not finished then. How do you minimize the consequences in NAS?

Pause all running jobs before and unpause them after? That would minimize the effects of the ongoing crawls.

But what will happen with the other processes and connections? Will the processes have to be restarted? And also the unpaused jobs, when they are ready, to make them reconnect? (I'm guessing wildly/groping blindly …) Anyone have experience?

Peter Svanberg
Technical officer
Aquisitions and Metadata Department
Film, Games, Sheet Music and Web Unit

National Library of Sweden
PO Box 5039, SE-102 41 Stockholm
Visits: Karlavägen 96, Stockholm
+46 10-709 32 78
Peter.Svanberg at kb.se<mailto:Peter.Svanberg at kb.se>
www.kb.se<https://www.kb.se/>

 _______________________________________________
NetarchiveSuite-users mailing list
NetarchiveSuite-users at ml.sbforge.org<mailto:NetarchiveSuite-users at ml.sbforge.org>
https://ml.sbforge.org/mailman/listinfo/netarchivesuite-users

________________________________

Venez découvrir le le musée de la BnF à Richelieu<https://www.bnf.fr/fr/le-musee-de-la-bnf>.

Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20240830/9435f175/attachment-0001.html>