<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body>
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">
<p>Dear all</p>
<p><br>
</p>
<p>Here´s the update fron Denmark. Please feel free to add more:-):</p>
<p><br>
</p>
<p></p>
<ul style="margin: 0px; color: rgb(51, 51, 51); font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen, Ubuntu, "Fira Sans", "Droid Sans", "Helvetica Neue", sans-serif; font-size: 14px;">
<li>Broad crawl
<ul style="margin: 0px; list-style-type: disc;">
<li>1st broadcrawl step 1 - 2023 still running - closing soon.</li><li>Issues with Hadoop, updating to RHEL8 and more</li></ul>
</li><li>Browserbased crawling for all IIPC-project strill awaiting funding for development this year</li><li>Pull requests for Browsertrix crawler behaviour and Instagram</li><li>Anders writing on a blog post/update on Browsertrix-IIPC-project- Will use the closing of semi-upscale super market chain Irma as an example (crawling Facebook, Instagram, TikTok, Twitter and maybe some embedde videos)</li><li>KB will focus more on Browsertrix-project the next month</li><li>Focus on goals for 2023 and what we currently can´t do with our 3.5-5.5 FTE working with web archiving - hping to make a strong case for more resources</li><li>Twitter API-harvesting stalled a bit - also awaiting new paid API-solution (9th of February)</li><li>Browsertrix status on project and how KB have used/and might use it in the future by Anders from KB with BnF - 16th of Februar (Online meeting)</li><li>Figuring out a way to visualize web crawling for KB´s permanent photo exhibition (Gephi or maybe even browserbased crawling progress screen-recording)</li><li>Data dumps - 3000+ PDFs, defacements from the Danish web (crawl times) and some lists, CDX-summary-like extraction of data for Janne/AU (Warcnet-project) </li><li>SolrWayback 4.4.0 software bundle has been released
<ul style="margin: 0px; list-style-type: disc;">
<li>SolrWayback bundle release 4.4.0 can be downloaded here:<span style="letter-spacing: 0px;"> </span><a href="https://github.com/netarchivesuite/solrwayback/releases/tag/4.4.0" class="external-link" rel="nofollow" style="color: rgb(59, 115, 175); text-decoration-line: none;" id="LPlnk22112">https://github.com/netarchivesuite/solrwayback/releases/tag/4.4.0</a></li><li><a href="https://github.com/netarchivesuite/solrwayback/blob/master/CHANGES.md" class="external-link" rel="nofollow" style="color: rgb(59, 115, 175); text-decoration: var(--aui-link-decoration); letter-spacing: 0px;">https://github.com/netarchivesuite/solrwayback/blob/master/CHANGES.md</a><em style="letter-spacing: 0px;"><span style="color: rgb(36, 41, 47);"><br>
<br>
'Visualization of search result by domain' can now be shown by day,week and month instead of only year. Same goes for the domain statistics in the toolbar. This is useful for recent collection that does not go back years. </span></em><em style="letter-spacing: 0px;"><span style="color: rgb(36, 41, 47);">(see </span><a href="https://github.com/netarchivesuite/solrwayback/issues/270" class="external-link" rel="nofollow" style="color: rgb(59, 115, 175); text-decoration-line: none;">#270</a><span style="color: rgb(36, 41, 47);">)
Thanks to Leslie Bellony from BnF for implementing this)</span></em></li></ul>
</li></ul>
<p></p>
<p><br>
</p>
<p>Best,</p>
<p>Anders</p>
</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Netarchivesuite-curator <netarchivesuite-curator-bounces@ml.sbforge.org> on behalf of auriane.quoix@bnf.fr <auriane.quoix@bnf.fr><br>
<b>Sent:</b> 06 February 2023 18:31:06<br>
<b>To:</b> netarchivesuite-curator@ml.sbforge.org<br>
<b>Cc:</b> DBN_DLWEB@bnf.fr<br>
<b>Subject:</b> [Netarchivesuite-curator] BnF NAS update for February</font>
<div> </div>
</div>
<div><font size="2" face="Verdana">Dear all,</font><br>
<font size="2" face="Verdana"><br>
First of all, this month, we are going to launch an internal project to improve several of our harvests. The project will run until July. It includes several parts:<br>
- improvement of the harvest of social networks (Twitter, Facebook, Instagram)<br>
- experiments with Browsertrix within the framework of our next internal harvesting workshop in March.<br>
- improvement of the press sites harvest<br>
- setting up Podcast and TikTok harvests.<br>
<br>
At the end of January, Wayback version 8.10.0 has been released. This new version includes the publication of our new virtual guided tour concerning Artificial Intelligence.<br>
This guided tour is made up of 13 themes. The topics covered range from scientific and technical applications of AI to ethical issues, and include the link between AI and art or human sciences.<br>
The sites presented in the guided tour were selected for the Artificial Intelligence harvest launched for the first time in December 2020, but there are also older captures, some of them dating from the early 2000s.<br>
On this occasion, a homepage of the "Archives de l’internet" on the subject of artificial intelligence has been republished.<br>
<br>
A new Video crawl is running since January 26th. We are harvesting 13 Youtube channels for an estimated size of 4,8 TB.<br>
<br>
Best regards,</font><br>
<font size="2" face="Verdana"><br>
The BnF digital legal deposit team</font><br>
<font face="sans-serif">
<hr>
<p>Venez découvrir le <strong><a href="https://www.bnf.fr/fr/le-musee-de-la-bnf">le nouveau musée de la BnF à Richelieu
</a></strong>.</p>
<p style="color:#008000"><strong>Avant d'imprimer, pensez à l'environnement.</strong></p>
</font></div>
</body>
</html>