[Netarchivesuite-users] Timelimit reporting in NAS

Peter Svanberg Peter.Svanberg at kb.se
Thu Jun 16 14:03:42 CEST 2022


I meant NAS can't determine whether a domain is completed when the job is timelimit stopped. Or?

-----
Peter Svanberg


Från: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org> För Peter Svanberg
Skickat: den 16 juni 2022 13:24
Till: netarchivesuite-users at ml.sbforge.org
Ämne: [Netarchivesuite-users] Timelimit reporting in NAS

Hello!

I wrote this on https://iipc.slack.com/archives/C2F63EUV7/p1655309749279209

Job with 500 domains. Timelimit (crawlLimiter.maxTimeSeconds) 24 hours.
(SeedUriDomainnameQueueAssignmentPolicy, HopsUriPrecedencePolicy, UnitCostAssignmentPolicy)

After a while just one or two domains are fetched, with some politeness seconds between fetches, until timelimit break. Looking at the job report a lot of domains are stopped due to "Harvester timelimit reached". Why aren't there any fetches from those domains in between the slow ones at the end? Some config/template/profile mistake?

Now I found https://sbforge.org/jira/browse/NAS-2065 and wonder if it can be a NAS problem? Or a shortcut? Maybe NAS can't get timelimit info on domain level? In my examples there were no domains (of 500) reported as completed.


[KB Logo]<https://www.kb.se/>

Peter Svanberg
Technical officer
Aquisitions and Metadata Department
Film, Games, Sheet Music and Web Unit

National Library of Sweden
PO Box 5039, SE-102 41 Stockholm
Visits: Karlavägen 96, Stockholm
+46 10-709 32 78
Peter.Svanberg at kb.se<mailto:Peter.Svanberg at kb.se>
www.kb.se<https://www.kb.se/>



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20220616/3dfb256a/attachment.html>


More information about the NetarchiveSuite-users mailing list