[Netarchivesuite-curator] BnF NAS update for October
Anders Klindt Myrvoll
ANKM at kb.dk
Mon Oct 3 18:56:51 CEST 2022
Dear all
Here are the news from Denmark:
* 3rd broad crawl ´22 almost finished. We are aiming for ending it october 10th, so we can have the 4th broadcrawl for 2022 (which is the norm)
* Focus on Paywall and IP-validation have payed off. We get important content from quite a few more sites now.
* Anders attended:
* Wanted: Social Media Data-conference in Brussels: https://www.kbr.be/en/agenda/wanted-social-media-data/ with the presentation Social media archiving at the Royal Danish Library_Sept_2022.pdf<https://sbforge.org/display/NAS/2022-10-04+Statusmeeting?preview=%2F107872496%2F107872500%2FSocial+media+archiving+at+the+Royal+Danish+Library_Sept_2022.pdf>
* Digital models in humanities research - https://www.it-vest.dk/events/conference-about-digital-models-in-humanities-research -pretty interesting and a lot of interest in web and social media data
* Almost finished with the updated JWAT for validation of Warc-files
* Our accesplatform - SolrWayback - is getting it´s own Citrix vLan and will have flash player, as well as Gephi and R installed - so will be a small workspace.
* Solrwayback: Tranferring internal issues in Jira to https://github.com/netarchivesuite/solrwayback. Also great to see more institutions using Solrwayback and contributing to the code (bugfixes and hopefully more in the future)
* Lots of data dump deliveries this month and in the horizon.
* CDX-summary of Netarkivets holdings. We are not able to participate at the moment.
* https://github.com/ymaurer/cdx-summarize
* https://netpreserveblog.wordpress.com/2022/08/10/investigate-holdings-of-web-archives-through-summaries-cdx-summarize/
See you tomorrow,
Anders
From: Netarchivesuite-curator <netarchivesuite-curator-bounces at ml.sbforge.org> On Behalf Of auriane.quoix at bnf.fr
Sent: Monday, October 3, 2022 3:47 PM
To: netarchivesuite-curator at ml.sbforge.org
Cc: DBN_DLWEB at bnf.fr
Subject: [Netarchivesuite-curator] BnF NAS update for October
Dear all,
Our 2022 broad crawl is going to be launched. This year, it has been possible to increase the budget to 2700 URLs per domain, for a total around 145 TB. Each job will end 3 days after its launch. The crawl is expected to finish in the middle of November.
We have started the preparation of two virtual guided tours. The first one will be published in December 2022 and will highlight our collections relating to artificial intelligence. The preparation of the second one has just started. It will concern Elections collections (2015-2022) and the publication is scheduled for the first quarter of 2023.
In October, we will start working on our next internal harvesting workshop scheduled for November.
It will be devoted to the harvest of Podcasts and sound documents and we will work with the Sound, Video and Multimedia department of the BnF.
Best regards,
The BnF digital legal deposit team
________________________________
Venez découvrir le le nouveau musée de la BnF à Richelieu <https://www.bnf.fr/fr/le-musee-de-la-bnf> .
Expositions Molière jusqu'au 15 janvier 2023 : Molière, le jeu du vrai et du faux<https://www.bnf.fr/fr/agenda/moliere-le-jeu-du-vrai-et-du-faux> à Richelieu et Molière en musiques<https://www.bnf.fr/fr/agenda/moliere-en-musiques> à l'Opéra.
Exposition Marcel Proust. La fabrique de l’œuvre<https://www.bnf.fr/fr/agenda/marcel-proust> du 11 octobre 2022 au 22 janvier 2023 – François-Mitterrand
Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20221003/255fc508/attachment-0001.html>
More information about the Netarchivesuite-curator
mailing list