[Netarchivesuite-users] Messaging (openmq) problems: looping (?) messages, > 6000 per second in and out
Tue Hejlskov Larsen
tlr at kb.dk
Wed Nov 8 15:46:21 CET 2023
Hello Peter
For a couple years ago we had similar problems with to high volumen of messages/communicaton between harvesters and HarvestJobManager and the GUIApplication. The GUI hang and messageques where overloaded with too mutch trafic. We lowered the number of calls, minimized retries, and delayed answers and repsonses in the settings files and perhaps also in the code. The last thing did Colin - i don't remeber where in the code.
I have attached relevant common snips from our GUIApplication, HarvestJobManager and and broadcrawl Harvester settings files.
If the trick is hided in the NAS code section for default values i need to investigate that further together with Colin.
Which options do you use for your imqcmd listning?
Normally I only use the imqcmd list dst to see if there are something wrong with the JMS queues.
Best regards
Tue
<jms>
..
<retries>10</retries>
..
</jms>
..
<jmx>
..
<timeout>120</timeout>
..
</jmx>
..
<monitor>
..
<jmxProxyTimeout>500</jmxProxyTimeout>
..
<reregisterDelay>10</reregisterDelay>
</monitor>
<heritrix>
<inactivityTimeout>1800</inactivityTimeout>
<noresponseTimeout>1800</noresponseTimeout>
<crawlLoopWaitTime>60</crawlLoopWaitTime>
..
</heritrix>
<frontier>
<!-- 2 minutes -->
<frontierReportWaitTime>120</frontierReportWaitTime>
..
</frontier>
________________________________
Fra: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org> på vegne af Peter Svanberg <Peter.Svanberg at kb.se>
Sendt: 8. november 2023 14:04
Til: netarchivesuite-users at ml.sbforge.org
Emne: [Netarchivesuite-users] Messaging (openmq) problems: looping (?) messages, > 6000 per second in and out
We have recurring problems (in July in both test and production and now just in test) with lots of messaging in the queue for sending HarvesterRegistrationResponse.
Over 6000 messages per second (see below) in both directions, figures which are normally 0.
Is cured by stopping all harvesters, restarting openq on the admin server and then starting harvesters.
Anyone else have had this? Any hints on what it could be? Or if we should debug in some way before we restart?
Currently the openmq process gets 120 % CPU while the NAS processes get less than 1 % now and then. I suppose this indicates that something is looping internally in the openmq process. But why …?
Displaying destination metrics where:
-----------------------------------------------
Destination Name Destination Type
-----------------------------------------------
PLIKT_COMMON_HCHAN_VAL_RESP Queue
On the broker specified by:
-------------------------
Host Primary Port
-------------------------
localhost 7676
----------------------------------------------------------------------------------
Msgs/sec Msg Bytes/sec Msg Count Total Msg Bytes (k) Largest
In Out In Out Current Peak Avg Current Peak Avg Msg (k)
----------------------------------------------------------------------------------
0 0 0 0 9 16 10 6 12 8 < 1
5848 5847 4650296 4649825 10 16 10 7 12 8 < 1
5920 5921 4707657 4707976 10 16 10 7 12 8 < 1
5818 5817 4626266 4625949 12 16 10 9 12 8 < 1
5784 5784 4599286 4599763 8 16 10 6 12 8 < 1
5970 5969 4747221 4746267 15 16 10 11 12 8 < 1
5830 5831 4635982 4636777 10 16 10 7 12 8 < 1
[KB Logo]<https://www.kb.se/>
Peter Svanberg
Technical officer
Aquisitions and Metadata Department
Film, Games, Sheet Music and Web Unit
National Library of Sweden
PO Box 5039, SE-102 41 Stockholm
Visits: Karlavägen 96, Stockholm
+46 10-709 32 78
Peter.Svanberg at kb.se
www.kb.se<https://www.kb.se/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20231108/e306ab58/attachment-0001.html>
More information about the NetarchiveSuite-users
mailing list