[Netarchivesuite-users] Preparing for and handling planned network outage during broad crawl

Fri Aug 30 13:56:26 CEST 2024

Hello Peter,

On BnF side, we can have network downtimes that can last more than half a 
day.
We don't restart neither the NAS applications, nor the broker. Unpausing 
the jobs works fine..
We just make sure the next capture dates on the harvests are consistent 
when reactivate them.

Best,

Sara

De :    "Tue Hejlskov Larsen" <tlr at kb.dk>
A :     "netarchivesuite-users at ml.sbforge.org" 
<netarchivesuite-users at ml.sbforge.org>
Date :  30/08/2024 13:43
Objet : Re: [Netarchivesuite-users] Preparing for and handling planned 
network outage during broad crawl
Envoyé par :    "NetarchiveSuite-users" 
<netarchivesuite-users-bounces at ml.sbforge.org>

Normally we only have planned network downtime < 15 min. If the  network 
down time is unprepared and takes longer the below description is a 
realistic JMS broker message scenario - at our site

Best regards
Tue

From: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org> 
On Behalf Of Peter Svanberg
Sent: Friday, August 30, 2024 1:29 PM
To: netarchivesuite-users at ml.sbforge.org
Subject: Re: [Netarchivesuite-users] Preparing for and handling planned 
network outage during broad crawl

Hello Tue and Sara!

Sara confirms my draft plan.
1.      Pause all ongoing jobs, so far in advance that there is no risk of 
jobs being in post-processing.
2.      Deactivate/postpone start of selective jobs.
3.      After outage: unpause all paused jobs. (Much clicking … maybe can 
be scripted.)

But what happens then? The unpaused jobs continue and post-processing 
works?

And if that works, does the queue handling and connections between the 
main server and all harvester instances remain so new jobs are created, or 
must something be restarted?

Tue, did you describe what happens if you do no preparing?

Best regards
Peter

Från: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org> 
För sara.aubry at bnf.fr
Skickat: den 29 augusti 2024 10:42
Till: netarchivesuite-users at ml.sbforge.org
Ämne: Re: [Netarchivesuite-users] Preparing for and handling planned 
network outage during broad crawl

Hello Peter,

Planned network outage is a lot easier to handle than unexpected ones!

At BnF, we paused all running jobs until we have the confirmation 
everything is back to normal. We already did this during broad crawls so 
for more than 60 cralwers.
The challenge is to make sure you have no new job to start, so we 
temporarily desactivate the Selective harvests.
If we have jobs at a post-processing stage that we cannot paused, we stop 
the HarvestController by putting a shutdown.txt file in the job 
directory..  

Best,

Sara

De :        "Tue Hejlskov Larsen" <tlr at kb.dk>
A :        "netarchivesuite-users at ml.sbforge.org" <
netarchivesuite-users at ml.sbforge.org>
Date :        29/08/2024 08:42
Objet :        Re: [Netarchivesuite-users] Preparing for and handling 
planned network outage during broad crawl
Envoyé par :        "NetarchiveSuite-users" <
netarchivesuite-users-bounces at ml.sbforge.org>

Hello Peter,

It depends on your max messages queues settings in the Open JMs broker 
properties and how long time there is no network. Our installation breaks 
down if it  loose the network connection more than an 10 – 30 
min..dependend on what’a running 
If the message queues hIt the maximum there is normally no other way than 
restart the whole platform and all running harvesterjobs needs to be 
restarted. 
You can try to restart the GUI it will try to empty the JMS queues. And 
sometimes - after the network is ok again - and you have patience and wait 
2-3 hours the broker will tries to resolve the messages queues and 
succeded to do it  last time.

Best regards
Tue

From: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org> 
On Behalf Of Peter Svanberg
Sent: Wednesday, August 28, 2024 6:01 PM
To: netarchivesuite-users at ml.sbforge.org
Subject: [Netarchivesuite-users] Preparing for and handling planned 
network outage during broad crawl

Hello!

There will be network service work at our site next weekend (7-8 Aug.). No 
figures on outage but they will change hardware so probably many minutes, 
maybe hours. Our current broad crawl pass is perhaps not finished then. 
How do you minimize the consequences in NAS?

Pause all running jobs before and unpause them after? That would minimize 
the effects of the ongoing crawls.

But what will happen with the other processes and connections? Will the 
processes have to be restarted? And also the unpaused jobs, when they are 
ready, to make them reconnect? (I'm guessing wildly/groping blindly …) 
Anyone have experience?

Peter Svanberg
Technical officer 
Aquisitions and Metadata Department
Film, Games, Sheet Music and Web Unit

National Library of Sweden
PO Box 5039, SE-102 41 Stockholm
Visits: Karlavägen 96, Stockholm
+46 10-709 32 78
Peter.Svanberg at kb.se
www.kb.se

 _______________________________________________
NetarchiveSuite-users mailing list
NetarchiveSuite-users at ml.sbforge.org
https://ml.sbforge.org/mailman/listinfo/netarchivesuite-users

Venez découvrir le le musée de la BnF à Richelieu.
Avant d'imprimer, pensez à l'environnement.
_______________________________________________
NetarchiveSuite-users mailing list
NetarchiveSuite-users at ml.sbforge.org
https://ml.sbforge.org/mailman/listinfo/netarchivesuite-users

Venez découvrir le  le musée de la BnF à Richelieu . Avant d'imprimer, pensez à l'environnement. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20240830/7aa71773/attachment.html>