<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:SimSun;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Mangal;
panose-1:0 0 4 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Aptos;}
@font-face
{font-family:"\@SimSun";
panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Aptos",sans-serif;
mso-ligatures:standardcontextual;
mso-fareast-language:EN-US;}
span.E-postmall17
{mso-style-type:personal-compose;
font-family:"Aptos",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:11.0pt;
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="SV" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">In our last broad crawl 2024, with Nas 7.5, we had serious problems with high load on the harvesting servers, making jobs crash. A large share of those where due to NullPointerException places in the code which was fixed
in 7.6.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Our first crawl in 2025 with NAS 7.6 then went very well, with none of the above. Much lower load on average. Nice, the 7.6 version and reduction of the number of instances per harvest server made everything run smoothly,
we thought. But no …<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Our first pass on next broad crawl now returned to high load and crash behavior, although not quite as bad as 2024. There are at least one more NullPointerException place to fix, we have learned, but that doesn’t seem
to happen so often as the previous ones. But there where also other crashed that I associate with high load:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">dk.netarkivet.common.exceptions.IOFailure: Port 8213 already in use, or port is out of range<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">dk.netarkivet.common.exceptions.IOFailure: Heritrix3 could not be shut down<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Connect to kw3-harvester12.kb.se:8243 [kw3-harvester12.kb.se/193.10.72.194] failed: Connection refused<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">All parameters and limits are the same between the smooth first pass 1 and this less smooth pass 1, except some minor difference in the templates. The only thing I could think of could influence anything is some changes
in the crawler trap regexes: values of type<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> </span><span lang="EN-US" style="font-family:"Courier New""><value>.*action=buy_now.*</value</span><span lang="EN-US">><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">changed to<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> </span><span lang="EN-US" style="font-family:"Courier New""><value>https?://[^/]+/.*action=buy_now.*</value><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">and another regex was made a bit more complicated . But I doubt that this could have that much of an impact.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Do you have any tips or hints?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Regards,<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<table class="MsoNormalTable" border="0" cellspacing="4" cellpadding="0" style="margin-left:4.5pt">
<tbody>
<tr>
<td style="padding:0cm 0cm 0cm 0cm"></td>
</tr>
</tbody>
</table>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif;mso-ligatures:none"><o:p> </o:p></span></p>
<table class="MsoNormalTable" border="0" cellspacing="4" cellpadding="0">
<tbody>
<tr>
<td style="padding:0cm 0cm 0cm 0cm">
<p class="MsoNormal" style="line-height:120%"><a href="https://www.kb.se/"><span style="font-size:9.0pt;line-height:120%;font-family:"Arial",sans-serif;color:blue;mso-fareast-language:SV;text-decoration:none"><img border="0" width="113" height="170" style="width:1.1785in;height:1.7738in" id="Bildobjekt_x0020_3" src="cid:image001.png@01DC2974.EF4D2DB0" alt="https://signaturloggor.kb.se/png/Outlook%20logo%20m%d0%a4rkbl%d0%96.png"></span></a><span style="font-size:9.0pt;line-height:120%;font-family:"Arial",sans-serif;color:black;mso-ligatures:none"><o:p></o:p></span></p>
</td>
<td style="padding:0cm 0cm 0cm 5.25pt">
<p class="MsoNormal" style="mso-margin-top-alt:2.0pt;margin-right:0cm;margin-bottom:1.0pt;margin-left:0cm">
<b><span lang="EN-US" style="font-size:9.0pt;font-family:"Arial",sans-serif;color:black;mso-ligatures:none;mso-fareast-language:SV">Peter Svanberg</span></b><span lang="EN-US" style="font-size:9.0pt;font-family:"Arial",sans-serif;color:black;mso-ligatures:none;mso-fareast-language:SV"><o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:2.0pt;margin-right:0cm;margin-bottom:1.0pt;margin-left:0cm">
<b><span lang="EN-US" style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-ligatures:none;mso-fareast-language:SV">Technical officer
</span></b><span lang="EN-US" style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-ligatures:none;mso-fareast-language:SV"><o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:2.0pt;margin-right:0cm;margin-bottom:1.0pt;margin-left:0cm">
<span lang="EN-US" style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-ligatures:none;mso-fareast-language:SV">Legal Deposit and Metadata Department<br>
Digital Material Legal Deposit Unit<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:4.0pt;font-family:"Arial",sans-serif;color:black;mso-ligatures:none;mso-fareast-language:SV"><o:p> </o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:2.0pt;margin-right:0cm;margin-bottom:1.0pt;margin-left:0cm">
<b><span lang="EN-US" style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-ligatures:none;mso-fareast-language:SV">National Library of Sweden</span></b><span lang="EN-US" style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-ligatures:none;mso-fareast-language:SV"><o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:2.0pt;margin-right:0cm;margin-bottom:1.0pt;margin-left:0cm">
<span lang="EN-US" style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-ligatures:none;mso-fareast-language:SV">PO Box 5039, SE-102 41 Stockholm<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:2.0pt;margin-right:0cm;margin-bottom:1.0pt;margin-left:0cm">
<span style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-ligatures:none;mso-fareast-language:SV">Visits: Karlavägen 96, Stockholm<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:2.0pt;margin-right:0cm;margin-bottom:1.0pt;margin-left:0cm">
<span style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-ligatures:none;mso-fareast-language:SV">+46 10-709 32 78<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:2.0pt;margin-right:0cm;margin-bottom:1.0pt;margin-left:0cm">
<span style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-ligatures:none;mso-fareast-language:SV">Peter.Svanberg@kb.se<o:p></o:p></span></p>
<p class="MsoNormal" style="line-height:120%"><span style="font-size:8.0pt;line-height:120%;font-family:"Arial",sans-serif;color:black;mso-ligatures:none;mso-fareast-language:SV"><a href="https://www.kb.se/"><span style="color:#3C6F9C">www.kb.se</span></a></span><span style="font-size:8.0pt;line-height:120%;font-family:"Arial",sans-serif;color:black;mso-ligatures:none"><o:p></o:p></span></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:"Times New Roman",serif;display:none;mso-ligatures:none;mso-fareast-language:SV"><o:p> </o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
</div>
</body>
</html>