[Netarchivesuite-users] Problem with QA

Søren Vejrup Carlsen svc at kb.dk
Mon Dec 3 15:43:04 CET 2012


Hi Meelis.
Without more information about your crawlsetup (order.xml and seeds.txt), and the domains being harvested
we cannot really help you much. Could you send us a couple of the domains (with start-seeds) that you're harvesting? Then we could see if the problem is a configuration problem or not. 

BTW why are you running a development release of Netarchivesuite? The only reason for running 3.21.0 is the support of using WARC as archival format instead of ARC. If you're not interested in that, you would probably be 
better of running the 3.20 release.

Best Regards
Søren Vejrup Carlsen
Developer and QA of NetarchiveSuite

-----Oprindelig meddelelse-----
Fra: netarchivesuite-users-bounces at ml.sbforge.org [mailto:netarchivesuite-users-bounces at ml.sbforge.org] På vegne af Meelis Mihhailov
Sendt: 3. december 2012 13:40
Til: Netarchive Suite Users
Emne: [Netarchivesuite-users] Problem with QA

Hi all!

I have a problem with NAS 3.21.0 QA indexing.
We use two configurations for our crawl, one with max-hops=25 and the other with max-hops=0.

Everything worked fine until now. When we create an index for the crawl in order to do QA all the main addresses return "not found" errors. I mean www.server.com are not found but all other that point to resource (.js, .css or images and files) are displayed OK.

This does not affect the links that are crawled with max-hops=0.

Can anyone help me figure out what is wrong? All logs show that the main domain is crawled. All ARC files contain the content that is fetched when www.server.com is crawled and index segments show that the resource is there and points to a correct ARC file.

At the moment I havent restarted NAS as we are currently in the middle of the crawl.


Meelis Mihhailov
National Library Of Estonia
meelis at nlib.ee

_______________________________________________
NetarchiveSuite-users mailing list
NetarchiveSuite-users at ml.sbforge.org
http://ml.sbforge.org/mailman/listinfo/netarchivesuite-users



More information about the NetarchiveSuite-users mailing list