[Netarchivesuite-users] Lots of -50 status codes

Peter Svanberg Peter.Svanberg at kb.se
Fri Jun 14 16:02:37 CEST 2019


Ahh, interesting aspect! Seems to be that the dns lookups are stopped by quota but not other, later request, which gets -50 status.

Could it be that the "isn't dns-lookedup" is discovered (giving -50) before the quota check?

2019-06-14T10:36:29.804Z   -50          - http://vasterasmotorstadion.se/xmlrpc.php?rsd RLLRE http://vasterasmotorstadion.se/ unknown #048 - - http://www.aktuellmotorsport.se 3t
2019-06-14T10:36:29.650Z -5003          - dns:vasterasmotorstadion.se RLLREP http://vasterasmotorstadion.se/xmlrpc.php?rsd unknown #048 - - http://www.aktuellmotorsport.se Q:groupMaxAllKb
2019-06-14T10:36:29.427Z -5003          - dns:vasterasmotorstadion.se RLLREP http://vasterasmotorstadion.se/xmlrpc.php?rsd unknown #048 - - http://www.aktuellmotorsport.se Q:groupMaxAllKb

How can I avoid this? Hmm, dns lookup shouldn't be stopped by quota limit!

Hmm (2), if most of the -50 is when the domain is over quota limit, then I can ignore them!?
-----

Peter Svanberg

National Library of Sweden
Phone: +46 10 709 32 78

E-mail: peter.svanberg at kb.se
Web: www.kb.se



Från: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org> För Colin Samuel Rosenthal
Skickat: den 14 juni 2019 14:28
Till: netarchivesuite-users at ml.sbforge.org
Ämne: Re: [Netarchivesuite-users] Lots of -50 status codes


I know we got rid of a lot -50 codes once we fixed our queue assignment policy to always queue dns lookups in the same queue as urls for the corresponding domain. But all that should be fixed in 5.5 . Do you see any problems with dns lookups?



cheers,

Colin


--
Colin Rosenthal PhD
Senior IT Consultant
Royal Danish Library (Aarhus)
________________________________
From: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org<mailto:netarchivesuite-users-bounces at ml.sbforge.org>> on behalf of Peter Svanberg <Peter.Svanberg at kb.se<mailto:Peter.Svanberg at kb.se>>
Sent: Friday, June 14, 2019 11:56:42 AM
To: netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>
Subject: [Netarchivesuite-users] Lots of -50 status codes

Hello all!

I get quite a lot of -50 status lines in my crawl.log (test snapshot runs, NetarchiveSuite 5.5). I haven't seen any pattern in which kind of URL:s that gets this - sometimes it is URL:s which redirects to entrance/top level page (should be banned!), sometimes it's quite ordinary URL:s, often to images, I think. And manual fetching later always works.

One pattern is that it seems as if it either

·         all requests to a certain host are -50, or

·         first a series of 200 and then ditto -50
so they are not intermixed. That could imply some problem - or automatic blocking after a while? - on the host.

What are your experiences?

-----

Peter Svanberg

National Library of Sweden
Phone: +46 10 709 32 78

E-mail: peter.svanberg at kb.se<mailto:peter.svanberg at kb.se>
Web: www.kb.se<http://www.kb.se>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20190614/4e5ebd33/attachment-0001.html>


More information about the NetarchiveSuite-users mailing list