From sas at kb.dk Tue Nov 5 08:42:16 2019 From: sas at kb.dk (Sabine Schostag) Date: Tue, 5 Nov 2019 07:42:16 +0000 Subject: [Netarchivesuite-curator] Monthly update from KB DK Message-ID: Dear all, Hereby a brief update on, what we primarily are busy with: Because of a political decision, a broadcast station (radio 24/syv) had to close down on 31 October. The announcement of the popular broadcast station to be closed raised a storm of reactions in the social media. People asked whether KB DK was going to keep this broadcasting station's archive. We tried to capture as much podcasts as possible with umbra. As to the QA of the harvested content, we had to wait for the generation of an index (there was a long queue in the index generator), which was not ready before 4 November. As Yahoo Groups are going to close down, too, we crawled Danish Yahoo Groups in the last couple of days. The fourth broad crawl for 2019 is in preparation: there will be a step 1 with a domain limit of 50 MB and a step 2 with a domain limit of 16 GB. Together with this broad crawl we will run the following selective crawls: Research databases, Municipalities and regions, Ministries and Government Agencies, YouTube We will crawl with NAS 5.5 and expect step 1 to last about 2 weeks, step 2 about 6-8 weeks. Other projects keeping us busy: * Work on risk assessment * Implementation of SolR Wayback * Consolidation of BCWeb (build up a community) * Revision of collection strategies * Capture of content behind paywalls - the never ending story On behalf of the Netarchive Team Best, Sabine -------------- next part -------------- An HTML attachment was scrubbed... URL: From geraldine.camile at bnf.fr Fri Nov 15 18:18:53 2019 From: geraldine.camile at bnf.fr (geraldine.camile at bnf.fr) Date: Fri, 15 Nov 2019 18:18:53 +0100 Subject: [Netarchivesuite-curator] BnF NAS update for November Message-ID: Hello all, We released the latest version of Bcweb. It includes a complete revision of the code with the new graphical user interface, the last administration functionalities and the data webservices. Since the 7th of October, date of the installation of the last version of NAS 5,6, we are no longer able to crawl some news websites with authentication, because Heritrix doesn't send the cookies anymore in the HTTP requests. It works if we used the previous version of Heritrix but the information about the version of NAS-Heritrix in the warc medata is still wrong. This problem must have consequences on all our focused and broad crawls but we don't know exactly which ones yet. On September the 1st, we welcomed a resarcher, Alexander Laporte, who will work on the collection of the First world war. We are waiting to know his needs precisely. In this context, we are considering a full-text operation. Best regards, The BnF digital legal deposit team Exposition Tolkien, voyage en Terre du Milieu - du 22 octobre 2019 au 16 f?vrier 2020 - BnF - Fran?ois-Mitterrand Avant d'imprimer, pensez ? l'environnement. -------------- next part -------------- An HTML attachment was scrubbed... URL: