<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:3.0cm 2.0cm 3.0cm 2.0cm;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="DA" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal">Dear all.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span lang="EN-US">Here is what we are grappling with just now:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US">Broad crawl:</span></b><span lang="EN-US"> first step of our third broad crawl for 2018 started on August 25 and is still ongoing.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US">Selctive crawl:</span></b><span lang="EN-US"> September 5 is the official commemoration day for Danish soldiers, who had been deployed in war or conflict zones. Together with partners from the Danish National Archives,
we are running an event crawl on this commemoration. We used BCWeb for the nomination of the url’s. Everything went fine – Steven has hardcoded the needed schedules, as we have no schedules with integrated hops with Heritrix 5. However, after the fourth crawl
all crawls failed without any changes. We made test crawls with the BCWeb schedules – they work fine. We still have not solved the problem. Therefore, we created a “replacement” event harvest definition without using BCWeb.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US">Open wayback:</span></b><span lang="EN-US"> We are now able to display pages using https, but far from all https-pages. For instance, we are not able to display social media pages.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US">Blacklight</span></b><span lang="EN-US"> (fulltext search): the facets to refine a search do not work.<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;background:white">
<b><span lang="EN-US" style="font-size:10.5pt;color:black;mso-fareast-language:DA">SOLRWayback</span></b><span lang="EN-US" style="font-size:10.5pt;color:black;mso-fareast-language:DA">: we made some tests in our production environment. The results are promising:
we are able to display pages form Twitter and Facebook crawls after they started using https. Now the most important is to resolve problems with the proxy browser setup.</span><span lang="EN-US" style="font-size:10.5pt;color:#333333;mso-fareast-language:DA"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">On behalf of the Netarchive Team<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:#1F497D;mso-fareast-language:DA">Best,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:#1F497D;mso-fareast-language:DA">Sabine<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>