[Netarchivesuite-curator] KB DK NAS update for March

Sabine Schostag sas at kb.dk
Tue Apr 9 09:32:14 CEST 2019

Dear all.

Hereby a brief update from KB DK

Broad crawl

We experienced some immense errors with our second broad crawl for 2019, among others one URL generated 250 mill. Objects, maybe because we augmented the byte limits. The crawl started on February 15 and is still ongoing

Selective crawls

We still focus on IP-validated access to content behind paywalls. We ran into problems with I-frames but they seem to be solved with Umbra. Another issue is to keep the site owners contact informations according to GDPR.

Event crawls

We are preparing for the elections for the EU Parliament and for Danish parliamentary elections. The latter has to take place at the latest in June 2019

Together with a colleague – a researcher we had a mini event crawl on April Fools using BCWeb for the nominations of URL’s. As you never know, where April Fools pop up, the researcher wanted us to crawl with 4 hops. Thus, the crawls are still ongoing. Part of the evaluation would be to crawl with less hops next time.

After the war in the 1860th Denmark lost a part of Southern Jutland to Prussia/Germany. After WW1 Southern Jutland became Danish again. Next year we will celebrate the centennial of this reunion – preparations are popping up on the internet. Thus, we are preparing an event crawl: we already have collected about 40 URL’s

Most urgent technical issue

Our citrix wayback access platform is performing very badly – among others it may take over 5 minutes to load a page and many images are not displayed

On behalf of the Netarchive team


From: Netarchivesuite-curator <netarchivesuite-curator-bounces at ml.sbforge.org> On Behalf Of peter.stirling at bnf.fr
Sent: Tuesday, March 12, 2019 9:54 AM
To: netarchivesuite-curator at ml.sbforge.org
Subject: [Netarchivesuite-curator] BnF NAS update for March

Hello all,

Early this month, we launched a selective project crawl for the European elections, taking place on May 26. Two collection departments (Law, Economics and Politics and Philosophy, History and Humanities) are taking part in the selection process, which is expected to last until June. The monthly, weekly and twice a day frequencies have been chosen as crawling parameters. A large share of the selected websites is expected to be from social networks, and particularly Twitter.

We are continuing work on BCweb, both on updating the graphics and putting in place webservices for accessing the data. Attached is the model for the homepage, the new graphics for this and the other pages will be integrated into the application in the coming weeks.

Best regards,
The BnF digital legal deposit team


Pass BnF lecture/culture : bibliothèques, expositions, conférences, concerts en illimité pour 15 € / an <http://www.bnf.fr/fr/la_bnf/anx_actu_bib/a.pass_bnf.html> – Acheter en ligne<https://inscriptionbilletterie.bnf.fr/>

Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20190409/5a1b76e1/attachment.html>

More information about the Netarchivesuite-curator mailing list