[Netarchivesuite-users] Exclude Domain from Fullcrawl
aponb at gmx.at
aponb at gmx.at
Fri Apr 24 11:09:46 CEST 2009
I would like to know if there is any possibility to exclude domains from
a full crawl, except via the crawlertraps in the settings.xml and the
limits configuration for that domain (which can only be a work around).
The thing is, that if you have some selective crawls which contain seeds
not beloning to your national domain, then this domain will be created
in order to execute the selective crawls. When you start the first
fullharvest that domain will be also crawled, although the domain doesn'
t belong to your range.
Another possiblity would be to modify that seed, which is belonging to
the defaultconfig, so that only that seed will be crawled during the
domain crawl. That seed could be of course that seed, which was used
during the selective crawl. But this link could be already outdated and
wouldn't crawled at all. Works - but it's a work around.
Just would like to know how you are thinking about this and how you are
solving this issue?
Thanks for your time
a.
More information about the NetarchiveSuite-users
mailing list