[Netarchivesuite-users] Some Questions

Peter M imagenoise at aol.com
Wed Jun 11 17:28:37 CEST 2014


Hello again,

since I'm not able to continue my netarchive suite exploration without
further knowledge, I've to reask three of my former questions.

>> 1) How can I delete/pause scheduled jobs, the ones classified as "new"
>> on the "Job Status" page? I can't find anything on the job status page
>> or on the "Details for Job X" page. For Deactivating on the Selective
>> Harvests page is to late because the job is already in the job-queue.
>> I can cancel them via heritrix gui one by one after they started, but
>> that takes a lot of time for several jobs an is kinda unfortunate,
>> because NAS thinks the job completd succesfully if I terminate the job
>> via heritrix-gui.


>> 2) > Yes it is only possible to add complete domains in the general domain
>>> listing. Subdomains or subdirectories are handle though seed lists,
>>> which are either defined on domains
>>
>> hm, ok. so to stay with the given example
>> (http://facebook.com/ladygaga), I would have to create a new selective
>> harvest definition, add facebook.com, edit the selective harvest
>> definition, add facebook.com/ladygaga as seed,  and then only the seeds
>> are harvested? do I have to deactivate facebook.com somehow so that
>> itself is not harvested but only the subdirectory? I tried it this way
>> and only 3,588 Bytes and 2 Documents got harvested.
>>
>> "Domain/Seeds for harvestdefinition ladygaga
>> Search results: 1, displaying results 1 to 1.
>>
>> previous / next
>> facebook.com (1 Seeds)
>>        http://facebook.com/ladygaga
>>
>> Total: 1 Domains / 1 Seeds"
>>
>> Actually I don't really understand the concept of seeds (domains seen to
>> be easy :) ), the given information on sbforge doesn't really help and I
>> can't find anything in the heritrix documentation.

>> 6) Httrack got a so called "near flag". With this it also downloads
>> content beeing embedded on the harvested page like e. g. an embedded
>> youtube video. Is something like this also possible with the NAS? Or
>> would that be a case for "Missing URL Collection"?

thanks a lot, best

peter


More information about the NetarchiveSuite-users mailing list