[Netarchivesuite-users] Messaging (openmq) problems: looping (?) messages, > 6000 per second in and out

Peter Svanberg Peter.Svanberg at kb.se
Thu Nov 9 11:35:16 CET 2023


Thank you Tue, I'll look into those config aspects. The imqcmd command was:

imqcmd metrics dst -passfile passfile -u admin -t q -n PLIKT_COMMON_HCHAN_VAL_RESP -m rts

But I forgot to mention that this happens on totally inactive servers, both admin and harvesters!

So some strange looping seems to start under certain conditions, where the openmq process keeps sending messages to the harvester instances as fast as it can. (See my e-mail 2 yesterday.)

-----
Peter Sv.


Från: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org> För Tue Hejlskov Larsen
Skickat: den 8 november 2023 15:46
Till: netarchivesuite-users at ml.sbforge.org
Ämne: Re: [Netarchivesuite-users] Messaging (openmq) problems: looping (?) messages, > 6000 per second in and out


Hello Peter



For a couple years ago we had similar problems with to high volumen of messages/communicaton between harvesters and HarvestJobManager  and the GUIApplication. The GUI hang and messageques where overloaded with too mutch trafic. We lowered the number of calls, minimized retries, and delayed answers and repsonses in  the settings files and perhaps also in the code. The last thing did Colin - i don't remeber where in the code.



I have attached relevant common snips from our GUIApplication, HarvestJobManager and and broadcrawl Harvester settings files.



If the trick is hided in the NAS code section for default values  i need to investigate that further together with Colin.



Which options do you use for your  imqcmd listning?

Normally I only use the imqcmd list dst to see if there are something wrong with the JMS queues.



Best regards

Tue


            <jms>
               ..
                <retries>10</retries>
                ..
            </jms>
..
            <jmx>
             ..
            <timeout>120</timeout>
             ..
          </jmx>
 ..
        <monitor>
         ..
            <jmxProxyTimeout>500</jmxProxyTimeout>
         ..
                <reregisterDelay>10</reregisterDelay>
        </monitor>

 <heritrix>
                        <inactivityTimeout>1800</inactivityTimeout>
                        <noresponseTimeout>1800</noresponseTimeout>
                        <crawlLoopWaitTime>60</crawlLoopWaitTime>
                        ..
</heritrix>
                <frontier>
                        <!-- 2 minutes -->
                        <frontierReportWaitTime>120</frontierReportWaitTime>
..
                </frontier>









________________________________
Fra: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org<mailto:netarchivesuite-users-bounces at ml.sbforge.org>> på vegne af Peter Svanberg <Peter.Svanberg at kb.se<mailto:Peter.Svanberg at kb.se>>
Sendt: 8. november 2023 14:04
Til: netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>
Emne: [Netarchivesuite-users] Messaging (openmq) problems: looping (?) messages, > 6000 per second in and out


We have recurring problems (in July in both test and production and now just in test) with lots of messaging in the  queue for sending HarvesterRegistrationResponse.



Over 6000 messages per second (see below) in both directions, figures which are normally 0.



Is cured by stopping all harvesters, restarting openq on the admin server and then starting harvesters.



Anyone else have had this? Any hints on what it could be? Or if we should debug in some way before we restart?



Currently the openmq process gets 120 % CPU while the NAS processes get less than 1 % now and then. I suppose this indicates that something is looping internally in the openmq process. But why ...?



Displaying destination metrics where:



-----------------------------------------------

Destination Name               Destination Type

-----------------------------------------------

PLIKT_COMMON_HCHAN_VAL_RESP    Queue



On the broker specified by:



-------------------------

Host         Primary Port

-------------------------

localhost    7676



----------------------------------------------------------------------------------

Msgs/sec    Msg Bytes/sec       Msg Count         Total Msg Bytes (k)     Largest

In   Out     In      Out    Current  Peak  Avg  Current   Peak     Avg    Msg (k)

----------------------------------------------------------------------------------

0     0      0        0        9      16   10      6       12       8       < 1

5848  5847  4650296  4649825    10      16   10      7       12       8       < 1

5920  5921  4707657  4707976    10      16   10      7       12       8       < 1

5818  5817  4626266  4625949    12      16   10      9       12       8       < 1

5784  5784  4599286  4599763     8      16   10      6       12       8       < 1

5970  5969  4747221  4746267    15      16   10     11       12       8       < 1

5830  5831  4635982  4636777    10      16   10      7       12       8       < 1



[KB Logo]<https://www.kb.se/>


Peter Svanberg

Technical officer

Aquisitions and Metadata Department
Film, Games, Sheet Music and Web Unit



National Library of Sweden

PO Box 5039, SE-102 41 Stockholm

Visits: Karlavägen 96, Stockholm

+46 10-709 32 78

Peter.Svanberg at kb.se<mailto:Peter.Svanberg at kb.se>

www.kb.se<https://www.kb.se/>





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20231109/618c4e76/attachment-0001.html>


More information about the NetarchiveSuite-users mailing list