[Netarchivesuite-users] Harvest a subdirectory of a domain
netarkivet at statsbiblioteket.dk
Thu Apr 24 10:06:53 CEST 2008
Yes - there cirtainly is.
You need to make a new configuration for the domain (mydomain.org) that covers only that part.
(1) Add the domain the your system
(2) Add a new configuration (or change defaultconfig)
(3) Select another template for your new configuration (or the changed defaultconfig)
- you need a path-scope-template - the NetarchiveSuite distribution should come with at least two such templates (having 'path' in
- depending on wheter you are using TRUNK from svn or latest stable release it could have different namings since we moving towards
DecidingScope in our use of heritrix.
The seedlist for your new configuration must contain seeds that have a path inside them - including a tailing-slash
- www.mydomain.com/subdir - will allow the ENTIRE host www.mydomain.com
- www.mydomain.com/subdir/ - will only allow the path /subdir/ (and subdirs to that) on the host www.mydomain.com
Daily Manager - netarchive.dk
State & University Library
DK-8000 Aarhus C
T: +45 89462165 - C: +45 25662353
CVR/SE 10100682 - EAN 5798000791084
Peter Moser wrote:
> Is there a possibility to harvest a part of a domain. I would like to add for example the following link to the seedlist: www.mydomain.org/subdir
> so only the subdir under should be fetched.
> If that is not possible with the netarchive suite, where can I start to change the application to do so.
> Can I do if I use only the heritrix application?
> Thanks in advance for answering!
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 312 bytes
Desc: not available
More information about the NetarchiveSuite-users