[Netarchivesuite-devel] Seed config limit
sara.aubry at bnf.fr
sara.aubry at bnf.fr
Wed Jan 29 09:21:39 CET 2020
Hi Andreas,
I'm not sure you got any answer to this question.
But I was going through the heritrix slack channel and saw this:
https://iipc.slack.com/archives/C2F63EUV7/p1567535683011800
Also, I think our colleagues from KB-DK are using a similar bean for this.
Sara
De : aponb at gmx.at
A : "netarchivesuite-devel at ml.sbforge.org"
<netarchivesuite-devel at ml.sbforge.org>
Date : 10/01/2020 23:23
Objet : [Netarchivesuite-devel] Seed config limit
Envoyé par : "Netarchivesuite-devel"
<netarchivesuite-devel-bounces at ml.sbforge.org>
How are you handle domains with many seeds during a crawl? For example I
am doing a crawl with the domain wordpress.com and I have in my default
seed list 100 Seeds (host1.wordpress.com to host100.wordpress.com) and
apply a limit of 100 MB. The crawl will be start with all seeds and of
course it will be finished by reaching the domain-config-limit of 100
MB. So many seeds were just only touched not more. What I really want is
to have a seed-config-limit of 100 MB. How can I reach this? How can I
enforce a limit by seed? Do you have any ideas?
Regards
a.
_______________________________________________
Netarchivesuite-devel mailing list
Netarchivesuite-devel at ml.sbforge.org
https://ml.sbforge.org/mailman/listinfo/netarchivesuite-devel
Exposition Tolkien, voyage en Terre du Milieu - du 22 octobre 2019 au 16 février 2020 - BnF - François-Mitterrand Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-devel/attachments/20200129/d14f2741/attachment.html>
More information about the Netarchivesuite-devel
mailing list