<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
span.EmailStyle20
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:3.0cm 2.0cm 3.0cm 2.0cm;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="DA" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Dear all,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Herby an update from Denmark
</span><span style="font-size:11.0pt;font-family:Wingdings;color:#1F497D;mso-fareast-language:EN-US">J</span><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Calibri",sans-serif;color:#203864;mso-style-textfill-fill-color:#203864;mso-style-textfill-fill-alpha:100.0%">In these weeks, we focus very much on getting familiar to the use of BCWeb and the adaption
of BCWeb to our needs. There will be local and regional elections in the end of the year and we would very much like to have a “Netarchive-BCweb” at that time, because we want to involve researchers and experts in for instance using social media with helping
us to find url’s.</span><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#203864;mso-style-textfill-fill-color:#203864;mso-style-textfill-fill-alpha:100.0%"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-family:"Calibri",sans-serif;color:#203864;mso-style-textfill-fill-color:#203864;mso-style-textfill-fill-alpha:100.0%">One crucial issue is, that we need the implementation of the possibility for bulk upload
of url’s.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Best,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">Sabine<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Netarchivesuite-curator [mailto:netarchivesuite-curator-bounces@ml.sbforge.org]
<b>On Behalf Of </b>peter.stirling@bnf.fr<br>
<b>Sent:</b> Monday, May 22, 2017 10:41 AM<br>
<b>To:</b> netarchivesuite-curator@ml.sbforge.org<br>
<b>Subject:</b> [Netarchivesuite-curator] BnF NAS update for May<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial",sans-serif">Hello all,</span><br>
<br>
<span style="font-size:10.0pt;font-family:"Arial",sans-serif">As of the middle of March, the BnF is using NetarchiveSuite 5 and Heritrix 3 for its selective crawls.</span><br>
<br>
<span style="font-size:10.0pt;font-family:"Arial",sans-serif">The first conclusion is that the quality of the crawl is better than with Heritrix 1:</span><br>
<span style="font-size:10.0pt;font-family:"Arial",sans-serif">- The percentage of URLs with a HTTP response code 2XX is higher than 80 % whereas with Heritrix 1, it's around 74 %.</span><br>
<span style="font-size:10.0pt;font-family:"Arial",sans-serif">- The number of 4XX is lower than with Heritrix 1. The duration of the crawls is shorter.</span><br>
<span style="font-size:10.0pt;font-family:"Arial",sans-serif">- Heritrix 3 crawls less content on domains outside the seed list than Heritrix 1, as a consequence there is a decrease in the percentage of images in the crawls.</span><br>
<br>
<span style="font-size:10.0pt;font-family:"Arial",sans-serif">Despite the lack of deduplication for the first selective crawls, the volume of the crawls is less than we had estimated. For example, the news crawl was previously between 0.25 and 0.3 TB per month
whereas now it is 0.14 TB. To avoid going over our storage budget, we had decreased by 10 % all our budgets (in terms of URLs collected) with the change to Heritrix 3.</span><br>
<br>
<span style="font-size:10.0pt;font-family:"Arial",sans-serif">For the moment, the most significative improvement is the crawl of HTTPS URLs. For example, the news crawl contains more than 30 % of seeds in HTTPS and with Heritrix 3, more than 80 % are harvested
against 69 % with Heritirix 1. </span><br>
<br>
<span style="font-size:10.0pt;font-family:"Arial",sans-serif">Heritrix 3 has also allowed us to simplify the harvest of subscription press sites. For this specific crawl, thanks to the new functionalities of Heritrix 3, the engineers were able to merge our
9 harvest templates into only 2: one for the HTTP and HTML authentication and one for the FTP crawl. The monitoring and the QA are really optimized.</span><br>
<br>
<span style="font-size:10.0pt;font-family:"Arial",sans-serif">We are continuing to analyse Heritrix 3 to better prepare the broad crawl.</span><br>
<br>
<span style="font-size:10.0pt;font-family:"Arial",sans-serif">Best regards,</span><br>
<span style="font-size:10.0pt;font-family:"Arial",sans-serif">The BnF digital legal deposit team</span><span style="font-family:"Arial",sans-serif"><o:p></o:p></span></p>
<div class="MsoNormal" align="center" style="text-align:center"><span style="font-family:"Arial",sans-serif">
<hr size="2" width="100%" align="center">
</span></div>
<p><span style="font-family:"Arial",sans-serif">Événement – <strong><span style="font-family:"Arial",sans-serif"><a href="http://www.bnf.fr/fr/la_bnf/anx_actu_bib/a.festival_bnf.html">La BnF fait son Festival</a></span></strong> – samedi 20 et dimanche 21 mai
2017 – François-Mitterrand<o:p></o:p></span></p>
<p><strong><span style="font-family:"Arial",sans-serif;color:green">Avant d'imprimer, pensez à l'environnement.</span></strong><span style="font-family:"Arial",sans-serif;color:green"><o:p></o:p></span></p>
</div>
</body>
</html>