[Netarchivesuite-devel] Seed config limit

aponb at gmx.at aponb at gmx.at
Fri Jan 10 23:23:38 CET 2020


How are you handle domains with many seeds during a crawl? For example I
am doing a crawl with the domain wordpress.com and I have in my default
seed list 100 Seeds (host1.wordpress.com to host100.wordpress.com) and
apply a limit of 100 MB. The crawl will be start with all seeds and of
course it will be finished by reaching the domain-config-limit of 100
MB. So many seeds were just only touched not more. What I really want is
to have a seed-config-limit of 100 MB. How can I reach this? How can I
enforce a limit by seed? Do you have any ideas?

Regards

a.



More information about the Netarchivesuite-devel mailing list