[Netarchivesuite-users] Some Questions
imagenoise at aol.com
Sat May 3 12:06:12 CEST 2014
Dear Netarchive Suite users,
I got a few questions and would be very thankfull for some help.
I'm Running Quickstart Installation of the 4.2 Suite.
1) How can I delete a selective harvest from the list?
I can delete the domains, not the harvest.
2) Is it only possible to add complete domains or subdomains and not
If I, e. g., wanna add "facebook.com/ladygaga" I get:
"The following domains are illegal and cannot be added".
3) How do I run several jobs at the same time?
In my setup (single machine) I very often got selective harvests with
only one domain. Heritrix beeing very polite/conservative with just 1
thread per page even with a non-fancy server and a slow connection I got
lots of ressources left. Running several instances of heritrix at the
same time to harvest several jobs would help saving a lot of time.
4) You wrote in 2011 "Only a very limited number of researchers are
currently using the Wayback access to the Danish webarchives. The
Viewerproxy is used for Curator access to the Archive." Is that still
the case? So how do researchers access different versions from a urls
which have been harvested at different dates? Do they have access to the
Harveststatus-jobdetails.jsp to press manualy "Select this job for QA
5) Your integration of fulltext search with solr seems to have been
successfull, are you going to publish a HOWTO or make the wiki
public? Are you using Solr Cell/ExtractingRequestHandler or custom code?
( 6) You tested openwayback 2.0 beta, is it possible to access
https-harvested sites? )
Thanks a lot and have a nice weekend!
More information about the NetarchiveSuite-users