[Netarchivesuite-users] Messaging (openmq) problems: looping (?) messages, > 6000 per second in and out

Thu Nov 9 13:14:25 CET 2023

Peter

I have never before heard about that.

Here is the result of the same command on our platform ( see below):

We have currently 125 crawling Heritrix instances on 15 physical and 5 virtuel servers with different specs on 2 destinations.

The newest most powerfull  broadcrawl server has 18 Heritrix instances  with about 3 TB fast disks, 64 G RAM and 32 CPU's. Every server has it's own active firewall.

I can send you more server specs if you need that.

Displaying destination metrics where:

----------------------------------------------
Destination Name              Destination Type
----------------------------------------------
PROD_COMMON_HCHAN_VAL_RESP    Queue

On the broker specified by:

-------------------------
Host         Primary Port
-------------------------
localhost    7676

----------------------------------------------------------------------------------
 Msgs/sec    Msg Bytes/sec       Msg Count         Total Msg Bytes (k)     Largest
 In   Out     In      Out    Current  Peak  Avg  Current   Peak     Avg    Msg (k)
----------------------------------------------------------------------------------
 0     0      0        0       83     190   85     64      147      60       < 1
1713  1714  1360036  1360491    69     190   85     53      147      60       < 1
1670  1672  1325993  1327578    71     190   85     55      147      60       < 1
1666  1667  1322654  1323287    67     190   85     51      147      60       < 1
1720  1717  1365280  1362899    79     190   85     61      147      60       < 1
1754  1754  1392373  1392530    73     190   85     56      147      60       < 1
1680  1678  1333359  1332097    85     190   85     65      147      60       < 1
1690  1692  1341715  1342826    78     190   85     60      147      60       < 1

We have no connection issues or overload under the current broadcrawl harvest. The harvesters start and stops without any help and all harvesters are running at the moment.

Only when we start the broadcrawl - it must be done in steps -  that is very important and we need to be pacient if we have startet too many at once, because it can take 2-3 hours before all harvesters are connected and running.

It is implemented that way because we ealier had a lot of  problems with jobs which changed from the New to the Submitted queue  and some hang there forever, until we restarted the broker without empty queues.

We will  only have a quite platform when we cold start the whole platform  in connection with a depoy of a new NAS version. This will first happen januar 2024.

Best regards

Tue

________________________________
Fra: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org> på vegne af Peter Svanberg <Peter.Svanberg at kb.se>
Sendt: 9. november 2023 11:35
Til: netarchivesuite-users at ml.sbforge.org
Emne: Re: [Netarchivesuite-users] Messaging (openmq) problems: looping (?) messages, > 6000 per second in and out

Thank you Tue, I’ll look into those config aspects. The imqcmd command was:

imqcmd metrics dst -passfile passfile -u admin -t q -n PLIKT_COMMON_HCHAN_VAL_RESP -m rts

But I forgot to mention that this happens on totally inactive servers, both admin and harvesters!

So some strange looping seems to start under certain conditions, where the openmq process keeps sending messages to the harvester instances as fast as it can. (See my e-mail 2 yesterday.)

-----
Peter Sv.

Från: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org> För Tue Hejlskov Larsen
Skickat: den 8 november 2023 15:46
Till: netarchivesuite-users at ml.sbforge.org
Ämne: Re: [Netarchivesuite-users] Messaging (openmq) problems: looping (?) messages, > 6000 per second in and out

Hello Peter

For a couple years ago we had similar problems with to high volumen of messages/communicaton between harvesters and HarvestJobManager  and the GUIApplication. The GUI hang and messageques where overloaded with too mutch trafic. We lowered the number of calls, minimized retries, and delayed answers and repsonses in  the settings files and perhaps also in the code. The last thing did Colin - i don't remeber where in the code.

I have attached relevant common snips from our GUIApplication, HarvestJobManager and and broadcrawl Harvester settings files.

If the trick is hided in the NAS code section for default values  i need to investigate that further together with Colin.

Which options do you use for your  imqcmd listning?

Normally I only use the imqcmd list dst to see if there are something wrong with the JMS queues.

Best regards

Tue

            <jms>

               ..

                <retries>10</retries>

                ..

            </jms>

..

            <jmx>

             ..

            <timeout>120</timeout>

             ..

          </jmx>

 ..

        <monitor>

         ..

            <jmxProxyTimeout>500</jmxProxyTimeout>

         ..

                <reregisterDelay>10</reregisterDelay>

        </monitor>

 <heritrix>

                        <inactivityTimeout>1800</inactivityTimeout>

                        <noresponseTimeout>1800</noresponseTimeout>

                        <crawlLoopWaitTime>60</crawlLoopWaitTime>

                        ..

</heritrix>

                <frontier>

                        <!-- 2 minutes -->

                        <frontierReportWaitTime>120</frontierReportWaitTime>

..

                </frontier>

________________________________

Fra: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org<mailto:netarchivesuite-users-bounces at ml.sbforge.org>> på vegne af Peter Svanberg <Peter.Svanberg at kb.se<mailto:Peter.Svanberg at kb.se>>
Sendt: 8. november 2023 14:04
Til: netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>
Emne: [Netarchivesuite-users] Messaging (openmq) problems: looping (?) messages, > 6000 per second in and out

We have recurring problems (in July in both test and production and now just in test) with lots of messaging in the  queue for sending HarvesterRegistrationResponse.

Over 6000 messages per second (see below) in both directions, figures which are normally 0.

Is cured by stopping all harvesters, restarting openq on the admin server and then starting harvesters.

Anyone else have had this? Any hints on what it could be? Or if we should debug in some way before we restart?

Currently the openmq process gets 120 % CPU while the NAS processes get less than 1 % now and then. I suppose this indicates that something is looping internally in the openmq process. But why …?

Displaying destination metrics where:

-----------------------------------------------

Destination Name               Destination Type

-----------------------------------------------

PLIKT_COMMON_HCHAN_VAL_RESP    Queue

On the broker specified by:

-------------------------

Host         Primary Port

-------------------------

localhost    7676

----------------------------------------------------------------------------------

Msgs/sec    Msg Bytes/sec       Msg Count         Total Msg Bytes (k)     Largest

In   Out     In      Out    Current  Peak  Avg  Current   Peak     Avg    Msg (k)

----------------------------------------------------------------------------------

0     0      0        0        9      16   10      6       12       8       < 1

5848  5847  4650296  4649825    10      16   10      7       12       8       < 1

5920  5921  4707657  4707976    10      16   10      7       12       8       < 1

5818  5817  4626266  4625949    12      16   10      9       12       8       < 1

5784  5784  4599286  4599763     8      16   10      6       12       8       < 1

5970  5969  4747221  4746267    15      16   10     11       12       8       < 1

5830  5831  4635982  4636777    10      16   10      7       12       8       < 1

[KB Logo]<https://www.kb.se/>

Peter Svanberg

Technical officer

Aquisitions and Metadata Department
Film, Games, Sheet Music and Web Unit

National Library of Sweden

PO Box 5039, SE-102 41 Stockholm

Visits: Karlavägen 96, Stockholm

+46 10-709 32 78

Peter.Svanberg at kb.se<mailto:Peter.Svanberg at kb.se>

www.kb.se<https://www.kb.se/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20231109/a1cf0e16/attachment-0001.html>