[Netarchivesuite-users] Exclude Domain from Fullcrawl

aponb at gmx.at aponb at gmx.at
Fri Apr 24 11:09:46 CEST 2009


I would like to know if there is any possibility to exclude domains from 
a full crawl, except via the crawlertraps in the settings.xml and the 
limits configuration for that domain (which can only be a work around).
The thing is, that if you have some selective crawls which contain seeds 
not beloning to your national domain, then this domain will be created 
in order to execute the selective crawls. When you start the first 
fullharvest that domain will be also crawled, although the domain doesn' 
t belong to your range.
Another possiblity would be to modify that seed, which is belonging to 
the defaultconfig, so that only that seed will be crawled during the 
domain crawl. That seed could be of course that seed, which was used 
during the selective crawl. But this link could be already outdated and 
wouldn't crawled at all. Works - but it's a work around.

Just would like to know how you are thinking about this and how you are 
solving this issue?

Thanks for your time
a.




More information about the NetarchiveSuite-users mailing list