<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000080;font-family:Calibri,Arial,Helvetica,sans-serif;" dir="ltr">
<p>Dear all.</p>
<p><br>
</p>
<p>Hereby a brief update from KB DK:</p>
<p><br>
</p>
<p><font face="Times New Roman"></font></p>
<p style="background: white; margin: 7.5pt 0cm 0pt;"><span lang="EN-US" style="color: rgb(51, 51, 51); font-family: "Arial","sans-serif"; font-size: 10.5pt; mso-ansi-language: EN-US;">We are preparing a 2-days workshop for Netarchive curators on harvesting
social media. Hopefully the outcome will be usefull for our coming event harvest on local and regional elections on 21 November. We also aim to use BCWeb with external partners on the election event harvest.</span></p>
<font face="Times New Roman"></font>
<p style="background: white; margin: 7.5pt 0cm 0pt;"><span lang="EN-US" style="color: rgb(51, 51, 51); font-family: "Arial","sans-serif"; font-size: 10.5pt; mso-ansi-language: EN-US;">The developers are going to have a workshop in the middle of October. The
curator wishes are as follows (in order of priority):</span></p>
<font face="Times New Roman"></font>
<ul style="list-style-type: disc; direction: ltr;">
<li style="color: rgb(51, 51, 51); font-size: 10.5pt; font-style: normal; font-weight: normal;">
<p style="background: white; color: rgb(0, 0, 0); font-size: 11pt; font-style: normal; font-weight: normal; margin-top: 7.5pt; margin-bottom: 0pt; mso-add-space: auto; mso-list: l0 level1 lfo1;">
<span lang="EN-US" style="color: rgb(51, 51, 51); font-family: "Arial","sans-serif"; font-size: 10.5pt; mso-ansi-language: EN-US;">Replay of https-pages in Wayback</span></p>
</li><li style="color: rgb(51, 51, 51); font-family: "Arial","sans-serif"; font-size: 10.5pt; font-style: normal; font-weight: normal;">
<p style="background: white; color: rgb(0, 0, 0); font-family: "Calibri","sans-serif"; font-size: 11pt; font-style: normal; font-weight: normal; margin-top: 7.5pt; margin-bottom: 0pt; mso-add-space: auto; mso-list: l0 level1 lfo1;">
<span lang="EN-US" style="color: rgb(51, 51, 51); font-family: "Arial","sans-serif"; font-size: 10.5pt; mso-ansi-language: EN-US;">Improvement of Heritrix and integration of supplementary collection tools (e.g. brozzler)</span></p>
</li><li style="color: rgb(51, 51, 51); font-family: "Arial","sans-serif"; font-size: 10.5pt; font-style: normal; font-weight: normal;">
<p style="background: white; color: rgb(0, 0, 0); font-family: "Calibri","sans-serif"; font-size: 11pt; font-style: normal; font-weight: normal; margin-top: 7.5pt; margin-bottom: 0pt; mso-add-space: auto; mso-list: l0 level1 lfo1;">
<span lang="EN-US" style="color: rgb(51, 51, 51); font-family: "Arial","sans-serif"; font-size: 10.5pt; mso-ansi-language: EN-US;">Introduction of a (technical) collection concept. This will give us the ability to integrate data collected before and without
NAS.</span></p>
</li><li style="color: rgb(51, 51, 51); font-family: "Arial","sans-serif"; font-size: 10.5pt; font-style: normal; font-weight: normal;">
<p style="background: white; color: rgb(0, 0, 0); font-family: "Calibri","sans-serif"; font-size: 11pt; font-style: normal; font-weight: normal; margin-top: 7.5pt; margin-bottom: 0pt; mso-add-space: auto; mso-list: l0 level1 lfo1;">
<span lang="EN-US" style="color: rgb(51, 51, 51); font-family: "Arial","sans-serif"; font-size: 10.5pt; mso-ansi-language: EN-US;">Improvement of Access</span></p>
</li><li style="color: rgb(51, 51, 51); font-family: "Arial","sans-serif"; font-size: 10.5pt; font-style: normal; font-weight: normal;">
<p style="background: white; color: rgb(0, 0, 0); font-family: "Calibri","sans-serif"; font-size: 11pt; font-style: normal; font-weight: normal; margin-top: 7.5pt; margin-bottom: 0pt; mso-add-space: auto; mso-list: l0 level1 lfo1;">
<span lang="EN-US" style="color: rgb(51, 51, 51); font-family: "Arial","sans-serif"; font-size: 10.5pt; mso-ansi-language: EN-US;">More automated QA</span></p>
</li></ul>
<font face="Times New Roman"></font>
<p style="background: white; margin: 7.5pt 0cm 0pt;"><span lang="EN-US" style="color: rgb(51, 51, 51); font-family: "Arial","sans-serif"; font-size: 10.5pt; mso-ansi-language: EN-US;">Most likely we wil not be able to perform a full broad crawl with 2 steps
this year (our last full broad crawl is from the beginning of 2016), because of our problems with Heritrix 3 Remote Access. We expect to be able to solve this problem with NAS 5.4, which will be implemented after having finished the compression of the archive
in the beginning of 2018.</span></p>
<font face="Times New Roman"></font>
<p style="background: white; margin: 7.5pt 0cm 0pt;"><span lang="EN-US" style="color: rgb(51, 51, 51); font-family: "Arial","sans-serif"; font-size: 10.5pt; mso-ansi-language: EN-US;">Since January 2017 we only harvested about 25 TB</span></p>
<font face="Times New Roman"></font>
<p style="background: white; margin: 7.5pt 0cm 0pt;"><span lang="EN-US" style="color: rgb(51, 51, 51); font-family: "Arial","sans-serif"; font-size: 10.5pt; mso-ansi-language: EN-US;">In the beginning of September 2017 Netarchive was blocked by about 54.000
domains (out of 1.32 Mill. Domains)</span></p>
<font face="Times New Roman"></font>
<p style="background: white; margin: 7.5pt 0cm 0pt;"><span lang="EN-US" style="color: rgb(51, 51, 51); font-family: "Arial","sans-serif"; font-size: 10.5pt; mso-ansi-language: EN-US;">The implementation of “Web Danica” (automated identification of Danish web
content outside .dk) is ongoing.</span></p>
<font face="Times New Roman"></font>
<p style="background: white; margin: 7.5pt 0cm 0pt;"><span lang="EN-US" style="color: rgb(51, 51, 51); font-family: "Arial","sans-serif"; font-size: 10.5pt; mso-ansi-language: EN-US;">The migration of documentation from the old “MediaWiki” to Jira is finished.</span></p>
<font face="Times New Roman"></font>
<p></p>
<p><br>
</p>
<p>Talk to you later today <img class="EmojiInsert" id="OWAEmoji43987" style="vertical-align: bottom;" alt="😊" src="cid:c3a23a34-8719-44ba-9c9f-bad379b06318"></p>
<p><br>
</p>
<p><br>
</p>
<div id="Signature">
<div style="margin: 0px; font-family: Calibri,Arial,Helvetica,sans-serif;" name="divtagdefaultwrapper">
Best, Sabine<font face="Tahoma" size="2"><br>
</font></div>
</div>
<div style="color: rgb(0, 0, 0);">
<hr tabindex="-1" style="width: 98%; display: inline-block;">
<div id="divRplyFwdMsg" dir="ltr"><font color="#000000" face="Calibri, sans-serif" style="font-size: 11pt;"><b>Fra:</b> Netarchivesuite-curator <netarchivesuite-curator-bounces@ml.sbforge.org> på vegne af peter.stirling@bnf.fr <peter.stirling@bnf.fr><br>
<b>Sendt:</b> 2. oktober 2017 13:55<br>
<b>Til:</b> netarchivesuite-curator@ml.sbforge.org<br>
<b>Emne:</b> [Netarchivesuite-curator] BnF NAS update for October</font>
<div> </div>
</div>
<div><font face="sans-serif" size="2">Hello all,</font><br>
<br>
<font face="sans-serif" size="2">There have been several changes in the team over the summer. Pascal Tanésie has arrived as assistant head of the digital legal deposit team, and Vladimir Tybin has joined the team as digital curator. Sophie Derrot has left the
BnF to take up a post at the Institut national d'histoire de l'art.</font><br>
<br>
<font face="sans-serif" size="2">Our second test broad crawl, with the complete seed list, is nearly finished. The amount of data crawled in this test has proved to be higher than our budget estimates, mainly because there is no deduplication for this first
broad crawl with H3. We will analyze the figures in detail and adapt the budget accordingly.
</font><br>
<br>
<font face="sans-serif" size="2">We are also using our new infrastructure for the tests: the crawlers are more powerful and faster but they use more bandwith. We will therefore need to reduce the number of crawlers from 40 to 35. We had set the duration of
each job to 3 days but this has proved to be too much, for the real crawl it will be betwen 2 and 2.5 days.</font><br>
<br>
<font face="sans-serif" size="2">This week we aim to transfer all our crawls onto the new infrastructure and the next week the real broad crawl will start.</font><br>
<br>
<font face="sans-serif" size="2">Best regards,</font><br>
<font face="sans-serif" size="2">The BnF digital legal deposit team</font><br>
<font face="sans-serif">
<hr>
<p>Nouveau :<br>
<strong><a href="http://heritage.bnf.fr/bibliothequesorient/fr">Ouverture du site Bibliothèques d’Orient</a></strong> - 7000 documents de 9 collections dans un site trilingue</p>
<p style="color: rgb(0, 128, 0);"><strong>Avant d'imprimer, pensez à l'environnement.</strong></p>
</font></div>
</div>
</div>
</body>
</html>