[Netarchivesuite-users] Some Questions

Peter M imagenoise at aol.com
Sat May 3 12:06:12 CEST 2014

Dear Netarchive Suite users,

I got a few questions and would be very thankfull for some help.
I'm Running Quickstart Installation of the 4.2 Suite.

1) How can I delete a selective harvest from the list?
I can delete the domains, not the harvest.

2) Is it only possible to add complete domains or subdomains and not

If I, e. g., wanna add "facebook.com/ladygaga" I get:

"The following domains are illegal and cannot be added".

3) How do I run several jobs at the same time?
In my setup (single machine) I very often got selective harvests with
only one domain. Heritrix beeing very polite/conservative with just 1
thread per page even with a non-fancy server and a slow connection I got
lots of ressources left. Running several instances of heritrix at the
same time to harvest several jobs would help saving a lot of time.

4) You wrote in 2011 "Only a very limited number of researchers are
currently using the Wayback access to the Danish webarchives. The
Viewerproxy is used for Curator access to the Archive."[1] Is that still
the case? So how do researchers access different versions from a urls
which have been harvested at different dates? Do they have access to the
Harveststatus-jobdetails.jsp to press manualy "Select this job for QA
with viewerproxy"?

5) Your integration of fulltext search with solr seems to have been
successfull[2], are you going to publish a HOWTO or make the wiki[3]
public? Are you using Solr Cell/ExtractingRequestHandler or custom code?

( 6) You tested openwayback 2.0 beta, is it possible to access
https-harvested sites? )

Thanks a lot and have a nice weekend!



[1] https://sbforge.org/display/NAS/Wayback+usage+in+the+Danish+webarchive
[2] https://sbforge.org/display/NAS/2013-06-18+Statusmeeting

More information about the NetarchiveSuite-users mailing list