<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0cm;
margin-right:0cm;
margin-bottom:0cm;
margin-left:36.0pt;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
span.EmailStyle19
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.EmailStyle20
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle21
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:3.0cm 2.0cm 3.0cm 2.0cm;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:167446035;
mso-list-type:hybrid;
mso-list-template-ids:-23454276 -239307020 67502083 67502085 67502081 67502083 67502085 67502081 67502083 67502085;}
@list l0:level1
{mso-level-start-at:0;
mso-level-number-format:bullet;
mso-level-text:-;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Calibri",sans-serif;
mso-fareast-font-family:Calibri;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="DA" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">Dear all,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">In brief, here is what we worked on since our last meeting:<o:p></o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US"><o:p> </o:p></span></b></p>
<p class="MsoNormal"><b><span lang="EN-US">Broad crawl<o:p></o:p></span></b></p>
<p class="MsoNormal"><span lang="EN-US">Step 2 is proceeding in a great fashion.<o:p></o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US"><o:p> </o:p></span></b></p>
<p class="MsoNormal"><b><span lang="EN-US">Event crawl<o:p></o:p></span></b></p>
<p class="MsoNormal"><span lang="EN-US">We decided to continue with the event crawl on Corona in Denmark but with lower frequency and. 0-hop sites reduced greatly, and with minimal curational activity.
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US" style="mso-fareast-language:DA">Alexandre, trainee
<o:p></o:p></span></b></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:DA">Arrived and is up and running, working remotely from Copenhagen with the rest of the team. We are almost done with the intro-program and are looking into what will give most value to Alexandre,
Netarkivet and also BnF.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:DA"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US">IT-University in Copenhagen:<o:p></o:p></span></b></p>
<p class="MsoNormal"><span lang="EN-US">The collaboration with the IT-University in Copenhagen is moving forward.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US">Youtube<o:p></o:p></span></b></p>
<p class="MsoNormal"><span lang="EN-US">We have experimented with getting embedded video-content and so far the results are great (except WARC-validation is not valid with re-visits)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US">WARC-file-validation<o:p></o:p></span></b></p>
<p class="MsoNormal"><span lang="EN-US">We are working on finalizing a workflow from Webrecorder/Conifer.org to Netarkivet. To be able to validate WARC-files correctly is a big part of getting the right level of preservation (we use JWAT for this). But it´s
a bit complicated – see for instance this OPF blog by Remco van Veenendaal from Holland:
</span><a href="https://openpreservation.org/blogs/warc-validation-tool-experiences/"><span lang="EN-US">https://openpreservation.org/blogs/warc-validation-tool-experiences/</span></a><span lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">(How) are you validating WARC-files? And what is the future on this?
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Added info from Tue<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:black">We have just discovered following error’s in our warc files. It seems to go back to when we activated the revisit generation in 2017/2018 after the compression of the Netachive.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:black">It is our NAS code, which generates a wrong 'WARC-Payload-Digest' format.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:black">We have never before tested, that our revisit 'WARC-Payload-Digest' format was valid according to the WARC standard<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:black"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:black">e.g.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:black"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:black">Error in '/home/prod/317160-265-20190806070159701-00000-sb-prod-har-006.statsbiblioteket.dk.warc.gz'<br>
Offset: 160011 (0x2710b)<br>
Record Type: 'revisit'<br>
Type: INVALID_EXPECTED<br>
Entity: 'WARC-Payload-Digest' value<br>
Value: ICLH3F6J3NMEIBRGD7ICP255OXIUDRWH<br>
Expected: <digest-algorithm>:<digest-encoded><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:black"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:black">See:<br>
</span><span style="color:black"><a href="http://bibnum.bnf.fr/warc/WARC_ISO_28500_version1_latestdraft.pdf" target="_blank" title="http://bibnum.bnf.fr/warc/WARC_ISO_28500_version1_latestdraft.pdf"><span lang="EN-US">http://bibnum.bnf.fr/warc/WARC_ISO_28500_version1_latestdraft.pdf</span></a></span><span lang="EN-US" style="color:black"> 2008<br>
<a href="http://bibnum.bnf.fr/warc/WARC_ISO_28500_version1-1_latestdraft.pdf 2017">http://bibnum.bnf.fr/warc/WARC_ISO_28500_version1-1_latestdraft.pdf 2017</a><o:p></o:p></span></p>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span lang="EN-US" style="color:black">We are missing the ”<digest-algorithm>: ”<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:black">BTW The webrecorder revisit warc files have the same revisit 'WARC-Payload-Digest' format issue.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">All the best on behalf of the Netarkivet-Team<o:p></o:p></span></p>
<p class="MsoNormal">Anders<span style="mso-fareast-language:DA"><o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><b><span style="mso-fareast-language:DA">Anders Klindt Myrvoll</span></b><span style="mso-fareast-language:DA"><br>
Faglig leder - Netarkivet<br>
<span style="color:gray">Programme Manager – the Danish web archive</span><o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:DA"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:DA">Digital Kulturarv<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:gray;mso-fareast-language:DA">Digital Cultural Heritage<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:DA"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:DA">+45 26850080<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:DA"><a href="mailto:ANKM@kb.dk">ANKM@kb.dk</a><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:DA"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:DA"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:DA"><img border="0" width="115" height="90" style="width:1.2in;height:.9416in" id="Billede_x0020_7" src="cid:image001.png@01D69B63.69B6BEE0" alt="cid:image003.png@01D50424.1ED49640"></span><span lang="EN-US" style="mso-fareast-language:DA"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:DA"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:DA">Det Kgl. Bibliotek<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:DA">Royal Danish Library<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="mso-fareast-language:DA"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:DA">Søren Kierkegaards Plads 1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:DA">DK-1221 København K<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:DA">+45 3347 4747<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:DA"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#5A5A5A;mso-fareast-language:DA">CVR 2898 8842<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#5A5A5A;mso-fareast-language:DA">EAN 5798 000 795297</span><span style="mso-fareast-language:DA"><o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
</div>
</body>
</html>