[Netarchivesuite-users] Strange slow non-existing-domain behavior

Peter Svanberg Peter.Svanberg at kb.se
Fri Apr 26 12:46:22 CEST 2019


Hmm, I realize I have two parameters having 300 second values:
                             frontier.retryDelaySeconds=300
                             frontier.snoozeLongMs=300000

But I don’t see any “,2t” or “,3t” in these passages and the harvester doesn’t do anything else, so why snooze?

And in another job I get 10 seconds pauses. And no “Details and Actions” page in GUI … (Not a good NAS day. ☹)

But the weather is quite nice!

/Peter


Från: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org> För Peter Svanberg
Skickat: den 26 april 2019 11:02
Till: netarchivesuite-users at ml.sbforge.org
Ämne: Re: [Netarchivesuite-users] Strange slow non-existing-domain behavior

Now I discover a simular behavior, but with 404 status, 300 second wait and no problem with the domain (quick answer with wget). Is it the same issue, solved in 5.5?

2019-04-26T08:14:38.108Z   404        449 http://adcove.se/contactform.error_changefontsize_no_size REX http://55b558c7-resources.builder.misssite.com/ea01a9c/en/translations.js?sections=w
idgets,mobile,shared_views,shared_components,cookie text/html #032 20190426081438022+85 sha1:7BBNH63Q5ARINIOLBLMA6MVZODMSSUSD http://www.adcove.se content-size:828
2019-04-26T08:19:38.243Z   404        449 http://adcove.se/contactform.error_changeFormTitle_no_value REX http://55b558c7-resources.builder.misssite.com/ea01a9c/en/translations.js?sections
=widgets,mobile,shared_views,shared_components,cookie text/html #032 20190426081938163+80 sha1:7BBNH63Q5ARINIOLBLMA6MVZODMSSUSD http://www.adcove.se content-size:828
2019-04-26T08:24:38.389Z   404        449 http://adcove.se/contactform.error_changegoallink_no_source REX http://55b558c7-resources.builder.misssite.com/ea01a9c/en/translations.js?sections
=widgets,mobile,shared_views,shared_components,cookie text/html #032 20190426082438299+90 sha1:7BBNH63Q5ARINIOLBLMA6MVZODMSSUSD http://www.adcove.se content-size:828

-----

Peter Svanberg

National Library of Sweden
Phone: +46 10 709 32 78

E-mail: peter.svanberg at kb.se<mailto:peter.svanberg at kb.se>
Web: www.kb.se<http://www.kb.se>



Från: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org<mailto:netarchivesuite-users-bounces at ml.sbforge.org>> För Tue Hejlskov Larsen
Skickat: den 21 mars 2019 06:16
Till: netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>
Ämne: Re: [Netarchivesuite-users] Strange slow non-existing-domain behavior

Hi Peter

We had also troubles with dns spam in 5.4.2.
Yes, it is fixed in 5.5.

Best regards
Tue

From: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org<mailto:netarchivesuite-users-bounces at ml.sbforge.org>> On Behalf Of Peter Svanberg
Sent: Wednesday, March 20, 2019 11:33 PM
To: netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>
Subject: [Netarchivesuite-users] Strange slow non-existing-domain behavior

Hello again!

Spurred by your previous problem-solving answers, I continue.

Strange Heritrix behavior: Do dns lookup, which fails. Report that with an -6 line. Then 10 minutes pause. Then a new dns lookup and so on.

What happens during the pause? Waiting for dns lookup in 600 seconds? Trying the request despite the failed lookup?

(Maybe one of the bugs fixed in 5.5?)

Log and template below.

Best regards,
-----

Peter Svanberg
Technical officer
Digital Collections Department, Newspapers, Radio and Television Division

National Library of Sweden
PO Box 5039<x-apple-data-detectors://1/1>
SE-104 51 Stockholm<x-apple-data-detectors://1/1>
Visits: Karlavägen 100, Stockholm <x-apple-data-detectors://2>
Phone<x-apple-data-detectors://2>: +46 10 709 32 78

E-mail: peter.svanberg at kb.se<mailto:peter.svanberg at kb.se>
Web: www.kb.se<http://www.kb.se/>




crawl log:

2019-03-20T21:48:42.119Z    -6          - http://lookbackvideo7-a.akamaihd.net/ RRX https://www.facebook.com/ unknown #033 - - http://www.fbcdn.net 2t
2019-03-20T21:48:41.164Z    -1          - dns:lookbackvideo7-a.akamaihd.net<http://a.akamaihd.net> RRXP http://lookbackvideo7-a.akamaihd.net/ text/dns #047 20190320214841119+45 - http://www.fbcdn.net 3t
2019-03-20T21:38:41.006Z    -6          - http://lookbackvideo6-a.akamaihd.net/ RRX https://www.facebook.com/ unknown #024 - - http://www.fbcdn.net 2t
2019-03-20T21:38:40.063Z    -1          - dns:lookbackvideo6-a.akamaihd.net<http://a.akamaihd.net> RRXP http://lookbackvideo6-a.akamaihd.net/ text/dns #026 20190320213840006+56 - http://www.fbcdn.net 3t
2019-03-20T21:28:39.896Z    -6          - http://lookbackvideo5-a.akamaihd.net/ RRX https://www.facebook.com/ unknown #045 - - http://www.fbcdn.net 2t
2019-03-20T21:28:38.942Z    -1          - dns:lookbackvideo5-a.akamaihd.net<http://a.akamaihd.net> RRXP http://lookbackvideo5-a.a

template:

fetchDns.enabled=true
fetchDns.acceptNonDnsResolves=false
fetchDns.digestContent=true
fetchDns.digestAlgorithm=sha1

fetchHttp.enabled=true
fetchHttp.timeoutSeconds=1200
fetchHttp.soTimeoutMs=20000
fetchHttp.maxFetchKBSec=0
fetchHttp.maxLengthBytes=0
fetchHttp.ignoreCookies=false
fetchHttp.sslTrustLevel=OPEN
fetchHttp.defaultEncoding=UTF-8
fetchHttp.digestContent=true
fetchHttp.digestAlgorithm=sha1
fetchHttp.sendIfModifiedSince=true
fetchHttp.sendIfNoneMatch=true
fetchHttp.sendConnectionClose=true
fetchHttp.sendReferer=true
fetchHttp.sendRange=false


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20190426/f5b3882e/attachment-0001.html>


More information about the NetarchiveSuite-users mailing list