[Netarchivesuite-users] Limit both number of bytes and number of objects per domain
sara.aubry at bnf.fr
sara.aubry at bnf.fr
Tue Aug 30 14:02:29 CEST 2022
Hi Peter,
I can't technically answer your question but QuotaEnforcer and
queueTotalBudget are two different processors and have not been
implemented in Heritrix to work together.
At BnF, we are using queueTotalBudget to manage queues by number of URLs.
Here is what we have in our profiles :
<!-- FRONTIER (START)
Record of all URIs discovered and queued-for-collection
-->
<bean id="frontier" class="org.archive.crawler.frontier.BdbFrontier">
<property name="maxRetries" value="10" />
<property name="retryDelaySeconds" value="60" />
<property name="recoveryLogEnabled" value="false" />
<property name="balanceReplenishAmount" value="1000" />
<property name="errorPenaltyAmount" value="1" />
<!-- NETARCHIVESUITE Placeholder
FRONTIER_QUEUE_TOTAL_BUDGET_PLACEHOLDER -->
<property name="queueTotalBudget"
value="%{FRONTIER_QUEUE_TOTAL_BUDGET_PLACEHOLDER}" />
<property name="snoozeLongMs" value="300000" />
<property name="extract404s" value="false" />
</bean>
<!-- FRONTIER (END) -->
And we have no place holder for the quotaEnforcer.
Best,
Sara
De : "Peter Svanberg" <Peter.Svanberg at kb.se>
A : "netarchivesuite-users at ml.sbforge.org"
<netarchivesuite-users at ml.sbforge.org>
Date : 30/08/2022 13:41
Objet : Re: [Netarchivesuite-users] Limit both number of bytes and number
of objects per domain
Envoyé par : "NetarchiveSuite-users"
<netarchivesuite-users-bounces at ml.sbforge.org>
Sorry, I mixed it up, alt. 3 edited below. So I suppose now that alt. 3 is
true. And that the value of frontier.queueTotalBudget is irrelevant if you
use quotaenforcer, i.e. if <ref bean="quotaenforcer"/> is among the
fetchProcessors.processors. True?
But there is a rumour that you should decide between byte and object limit
– true or false?
Regards,
-----
Peter Svanberg
Från: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org>
För Peter Svanberg
Skickat: den 29 augusti 2022 14:20
Till: netarchivesuite-users at ml.sbforge.org
Ämne: [Netarchivesuite-users] Limit both number of bytes and number of
objects per domain
Could someone please explain this handling?
In a snapshot we want to limit both number of bytes and number of objects
per domain. If you give positive values in GUI for new snapshot harvest,
what is recommended?
1. You should not. Why not?
2. You must change
settings.harvester.scheduler.jobGen.objectLimitIsSetByQuotaEnforcer to
false and change
settings.harvester.harvesting.harvestReport.class to
dk.netarkivet.harvester.harvesting.report.BnfHarvestReport (which doesn’t
assume annotations in crawl log).
3. You can keep
settings.harvester.scheduler.jobGen.objectLimitIsSetByQuotaEnforcer as
true and it works …? Even though FRONTIER_QUEUE_TOTAL_BUDGET_PLACEHOLDER
(and hence frontier.queueTotalBudget) is set to infinity?
QUOTA_ENFORCER_GROUP_MAX_FETCH_SUCCES_PLACEHOLDER in template (and hence
quotaenforcer.groupMaxFetchSuccesses) is set to infinity (in
configureQuotaEnforcer())?
Regards,
Peter Svanberg
Technical officer
Aquisitions and Metadata Department
Film, Games, Sheet Music and Web Unit
National Library of Sweden
PO Box 5039, SE-102 41 Stockholm
Visits: Karlavägen 96, Stockholm
+46 10-709 32 78
Peter.Svanberg at kb.se
www.kb.se
_______________________________________________
NetarchiveSuite-users mailing list
NetarchiveSuite-users at ml.sbforge.org
https://ml.sbforge.org/mailman/listinfo/netarchivesuite-users
Samedi 17 et dimanche 18 septembre 2022 : la BnF fête la réouverture du site Richelieu , après douze ans de travaux de rénovation et de modernisation, avec un parcours de visite en compagnie d’artistes et comédiens l'après-midi, et des événements et performances la soirée. Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20220830/71f06b4b/attachment-0001.html>
More information about the NetarchiveSuite-users
mailing list