[Netarchivesuite-users] Getting round 500 response and NullReferenceException

Cunnea, Paul p.cunnea at nls.uk
Fri Oct 31 15:07:34 CET 2008


Hi,

 

This may be one for the Hertrix list, but I thought I would try here
first as we are using Netarchive (with Heritrix 1.12) - we're still
essentially novices at using Netarchive and Heritrix here at the
National Library of Scotland.

 

We are getting 500 internal error responses when attempting to archive a
site (http://www.scotland.gov.uk/) - it seems to get the robots.txt, a
redirect, then nothing else.  Same result with additional seeds.
Initial crawl ignored robots.txt, but we get the same result when using
classic.

 

We have replicated the problem using standalone Heritrix 1.14, but are
able to archive the site using an alternative crawler. We're assuming
the problem lies with the website and how Heritrix is fetching content,
but would like to know if there is anything we can do via the harvest
template settings before contacting the website owner.

 

Excerpt from crawl log:



metadata://netarkivet.dk/crawl/reports/responsecode-report.txt?heritrixV
ersion=1.12.1b&harvestid=63&jobid=286 127.0.0.1 20081029143506
text/plain 40
[rescode] [#urls]
1 1
200 1
302 1
500 1
 
metadata://netarkivet.dk/crawl/reports/seeds-report.txt?heritrixVersion=
1.12.1b&harvestid=63&jobid=286 127.0.0.1 20081029143506 text/plain 106
[code] [status] [seed] [redirect]
302 CRAWLED http://www.scotland.gov.uk/ http://www.scotland.gov.uk/Home
 
metadata://netarkivet.dk/crawl/logs/crawl.log?heritrixVersion=1.12.1b&ha
rvestid=63&jobid=286 127.0.0.1 20081029143506 text/plain 761
2008-10-29T14:35:05.238Z     1         60 dns:www.scotland.gov.uk P
http://www.scotland.gov.uk/ text/dns #001 20081029143504818+82
sha1:YW3TTZVRWR66P5FJGU3M6H6RTC73JCPA - content-size:60
2008-10-29T14:35:05.682Z   200        214
http://www.scotland.gov.uk/robots.txt P http://www.scotland.gov.uk/
text/plain #003 20081029143505562+115
sha1:EXHMPB3HYORL26TZO5SEWZCFJMOLOGHE - content-size:579
2008-10-29T14:35:06.084Z   302        122 http://www.scotland.gov.uk/ -
- text/html #001 20081029143505992+73
sha1:LO334SHJODRDP46VXYE6E66HX4TGCHNN - content-size:476,3t
2008-10-29T14:35:06.522Z   500       4602
http://www.scotland.gov.uk/Home R http://www.scotland.gov.uk/ text/html
#003 20081029143506393+119 sha1:T2SCFPKKFQPTMRPWNQVI7QLJLY6V3KYO -
content-size:4956
 
metadata://netarkivet.dk/crawl/logs/local-errors.log?heritrixVersion=1.1
2.1b&harvestid=63&jobid=286 127.0.0.1 20081029143503 text/plain 0
 
metadata://netarkivet.dk/crawl/logs/progress-statistics.log?heritrixVers
ion=1.12.1b&harvestid=63&jobid=286 127.0.0.1 20081029143506 text/plain
472
20081029143504 CRAWL RESUMED - Running
           timestamp  discovered      queued   downloaded
doc/s(avg)  KB/s(avg)   dl-failures   busy-thread   mem-use-KB
heap-size-KB   congestion   max-depth   avg-depth
20081029143506 CRAWL ENDING - Finished
2008-10-29T14:35:06Z           4           0            4
4(4)       5(5)             0             0        20045         33792
1           0           0
20081029143506 CRAWL ENDED - Finished
 

 

When viewing via the proxy viewer it comes up with unhandled exception
error - 

 

Exception Details: System.NullReferenceException: Object reference not
set to an instance of an object.

 

The stack trace is:

 

NullReferenceException: Object reference not set to an instance of an
object.]

   ScottishExecutive.PageCache.ServePage(String pgAlias) +272

   ASP.global_asax.Application_ResolveRequestCache(Object sender,
EventArgs e) +181

 
System.Web.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionS
tep.Execute() +92

   System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean&
completedSynchronously) +64



 

Thanks for any advice,

Paul

 

 

Paul Cunnea

Digital Collections Manager

National Library of Scotland

t: +44-131-623-4671  e: p.cunnea at nls.uk <mailto:p.cunnea at nls.uk> 

 


***********************************************************************
Visit the National Library of Scotland online at www.nls.uk

CELEBRATING 500 YEARS OF SCOTTISH PRINTING 1508-2008
http://www.500yearsofprinting.org
***********************************************************************
Please consider the environment before printing this e-mail.

This communication is intended for the addressee(s) only. If you
are not the intended recipient, please notify the ICT Helpdesk on
+44 131 623 3700 or ict at nls.uk and delete this e-mail.  The
statements and opinions expressed in this message are those of the
author and do not necessarily reflect those of the National Library of
Scotland.  The National Library of Scotland is a registered Scottish charity.  Scottish Charity No. SC011086.  This message is subject to the Data Protection Act 1998 
and Freedom of Information (Scotland) Act 2002 and has been 
scanned by MessageLabs.
***********************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20081031/6ffc23e3/attachment-0002.html>


More information about the NetarchiveSuite-users mailing list