[Netarchivesuite-users] Some Questions
Peter M
imagenoise at aol.com
Sat May 3 12:06:12 CEST 2014
Dear Netarchive Suite users,
I got a few questions and would be very thankfull for some help.
I'm Running Quickstart Installation of the 4.2 Suite.
1) How can I delete a selective harvest from the list?
Under
http://localhost:8074/HarvestDefinition/Definitions-selective-harvests.jsp
I can delete the domains, not the harvest.
2) Is it only possible to add complete domains or subdomains and not
subdirectories?
If I, e. g., wanna add "facebook.com/ladygaga" I get:
"The following domains are illegal and cannot be added".
3) How do I run several jobs at the same time?
In my setup (single machine) I very often got selective harvests with
only one domain. Heritrix beeing very polite/conservative with just 1
thread per page even with a non-fancy server and a slow connection I got
lots of ressources left. Running several instances of heritrix at the
same time to harvest several jobs would help saving a lot of time.
4) You wrote in 2011 "Only a very limited number of researchers are
currently using the Wayback access to the Danish webarchives. The
Viewerproxy is used for Curator access to the Archive."[1] Is that still
the case? So how do researchers access different versions from a urls
which have been harvested at different dates? Do they have access to the
Harveststatus-jobdetails.jsp to press manualy "Select this job for QA
with viewerproxy"?
5) Your integration of fulltext search with solr seems to have been
successfull[2], are you going to publish a HOWTO or make the wiki[3]
public? Are you using Solr Cell/ExtractingRequestHandler or custom code?
( 6) You tested openwayback 2.0 beta, is it possible to access
https-harvested sites? )
Thanks a lot and have a nice weekend!
Best
Peter
[1] https://sbforge.org/display/NAS/Wayback+usage+in+the+Danish+webarchive
[2] https://sbforge.org/display/NAS/2013-06-18+Statusmeeting
[3]
https://sbprojects.statsbiblioteket.dk/pages/viewpage.action?pageId=5866377
More information about the NetarchiveSuite-users
mailing list