<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000080;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">
<p>Dear all,</p>
<p><br>
</p>
<p>hereby the update from DK</p>
<p><br>
</p>
<p><font face="Times New Roman"></font></p>
<p style="margin: 0cm 0cm 0pt;"><span lang="EN-US" style="mso-ansi-language: EN-US;"><font face="Calibri">As part of the organizational changes at KB DK, Netarchive is now part of one department, the department for digital cultural heritage. A group is working
on integrating policies and strategies for all material belonging to digital cultural heritage.</font></span></p>
<font face="Times New Roman"></font>
<p style="margin: 0cm 0cm 0pt;"><span lang="EN-US" style="mso-ansi-language: EN-US;"><font face="Calibri"> </font></span></p>
<font face="Times New Roman"></font>
<p style="margin: 0cm 0cm 0pt;"><span lang="EN-US" style="mso-ansi-language: EN-US;"><font face="Calibri">A project group (to curators are part of the group) is working on a clarification of our future front end for archive users. An important point is the
replay of pages with the https-protocol.</font></span></p>
<font face="Times New Roman"></font>
<p style="margin: 0cm 0cm 0pt;"><span lang="EN-US" style="mso-ansi-language: EN-US;"><font face="Calibri"> </font></span></p>
<font face="Times New Roman"></font>
<p style="margin: 0cm 0cm 0pt;"><span lang="EN-US" style="mso-ansi-language: EN-US;"><font face="Calibri">The event harvest for the collective negotiations on pay for public employees is ongoing as the negotiations are ongoing. The mediator has postponed potential
strikes and lockouts. We have got some input from curators at the workers museum – the most important for us is that they are very motivated, we look forward to future cooperation – with BCWeb.</font></span></p>
<font face="Times New Roman"></font>
<p style="margin: 0cm 0cm 0pt;"><span lang="EN-US" style="mso-ansi-language: EN-US;"><font face="Calibri"> </font></span></p>
<font face="Times New Roman"></font>
<p style="margin: 0cm 0cm 0pt;"><span lang="EN-US" style="mso-ansi-language: EN-US;"><font face="Calibri">We are setting up a pilot project for the use of BCWeb with archivists form a local archive on a specific event in the middle of June:
</font><a id="LPlnk541684" href="https://folkemoedet.dk/en/" previewremoved="true"><font color="#0000ff" face="Calibri">https://folkemoedet.dk/en/</font></a><font face="Calibri">
</font></span></p>
<font face="Times New Roman"></font>
<p style="margin: 0cm 0cm 0pt;"><span lang="EN-US" style="mso-ansi-language: EN-US;"><font face="Calibri"> </font></span></p>
<font face="Times New Roman"></font>
<p style="margin: 0cm 0cm 0pt;"><span lang="EN-US" style="mso-ansi-language: EN-US;"><font face="Calibri">KB DK has send out the job advertisement for a lead coordinator for Netarchive: the ideal person for this job would have both curational and technical
skills within web archiving and experiences as project lead.</font></span></p>
<font face="Times New Roman"></font>
<p></p>
<p><br>
</p>
<p>On behalf of the Netarchive Team</p>
<p>Best, Sabine</p>
<p><br>
</p>
<p><br>
</p>
<div id="Signature">
<div id="divtagdefaultwrapper" style="color: rgb(0, 0, 128); font-family: Calibri,Helvetica,sans-serif,'EmojiFont','Apple Color Emoji', 'Segoe UI Emoji', NotoColorEmoji, 'Segoe UI Symbol', 'Android Emoji', EmojiSymbols; font-size: 12pt;" dir="ltr">
<div style="margin: 0px; font-family: Calibri,Arial,Helvetica,sans-serif;" name="divtagdefaultwrapper">
<font face="Tahoma" size="2"> <br>
<br>
</font></div>
</div>
</div>
<div style="color: rgb(0, 0, 0);">
<hr tabindex="-1" style="width: 98%; display: inline-block;">
<div id="divRplyFwdMsg" dir="ltr"><font color="#000000" face="Calibri, sans-serif" style="font-size: 11pt;"><b>Fra:</b> Netarchivesuite-curator <netarchivesuite-curator-bounces@ml.sbforge.org> på vegne af peter.stirling@bnf.fr <peter.stirling@bnf.fr><br>
<b>Sendt:</b> 6. april 2018 13:48<br>
<b>Til:</b> netarchivesuite-curator@ml.sbforge.org<br>
<b>Emne:</b> [Netarchivesuite-curator] BnF NAS update for April</font>
<div> </div>
</div>
<div><font face="sans-serif" size="2">Hello all,</font><br>
<br>
<font face="sans-serif" size="2">As mentioned in previous updates last year, over the past months we have been working on our full-text indexing process and the search interface. Last week we opened access to the new version of our full-text search application
"Archives de l'internet Labs", which is now integrated with the main interface. It has been updated with new graphics and new functions, in particular the grouping of identical URLs in search results. We have to thank Toke for his help with this, as we had
real problems regarding performance when we applied this grouping to searches with large numbers of results - Toke advised us to deactivate in Solr the count of the number of groups generated (which we used to calculate the number of pages of results) and
this made a huge improvement. We also tidied up the code of the application with a view to allowing its use by other institutions who use warc-indexer.</font><br>
<br>
<font face="sans-serif" size="2">On the indexing side, the main objective of the project was to index our daily news crawl since its creation at the end of 2010; we originally aimed to index the period up to the end of 2016 but we were able to extend this to
the end of 2017. This means we were also able to treat WARCs containing revisit records, which we have been producing since changing to Heritrix 3 last year. We worked with the community on the latest version of warc-indexer, in particular to define collection
names based on W/ARC filenames, and therefore on the harvest definitions in NAS. The news crawl represented an increase in the amount of data indexed compared to the collections previously indexed (around 13 TB, compared to around 2.5 TB) and also in terms
of the final index size, which roughly doubled to around 2.4 TB. To handle with this we also put in place a new infrastructure, and the performances are much better than out previous prototype, though we will aim to continue work on the configuration in future
developments.</font><br>
<br>
<font face="sans-serif" size="2">Work on the research project that is using the news crawl to study neologisms is ongoing. We are working with the research team to see if some of the analyses that they apply, such as Named Entity Recognition and Topic Modelling,
can be included in our indexing and search systems.</font><br>
<br>
<font face="sans-serif" size="2">Best regards,</font><br>
<font face="sans-serif" size="2">The BnF digital legal deposit team</font><font face="sans-serif">
<hr>
<p><strong><a href="http://www.bnf.fr/fr/collections_et_services/anx_bib_num/a.gallica_20ans.html">20 ans de Gallica : la plus grande biblioth?que num?rique en acc?s libre f?te son anniversaire</a></strong></p>
<strong>
<p style="color: rgb(0, 128, 0);"><strong>Avant d'imprimer, pensez ? l'environnement.</strong></p>
</strong></font></div>
</div>
</div>
</body>
</html>