[Netarchivesuite-curator] Summer update from Netarchive

Sabine Schostag sas at kb.dk
Tue Sep 8 10:09:20 CEST 2020


Dear all,

In brief, here is what we worked on since our last meeting:

Broad crawl
We started our second broad crawl for 2020 on 20 August, the first step with a byte limit of 50 MB finished on 2 September. On 21 August we started the separate crawl of ultra big sites, this crawl is still running.

Event crawl
We have to decide, whether we want to stop the event crawl on Corona in Denmark or not, there are different opinions on that issue. What are you doing with your event crawls on Covid-19

Miscellaneous
Everything is prepared for the French trainee: we signed a contract and he will start on 28 September. He wants to work on visualization of data and Netarchive.

We started a collaboration with the IT-University in Copenhagen: students participating in a course on project work and communication for software developers will work together with us on several special challenges.

We try to solve various technical issues; we got aware of most of them on the base of emails from persons dealing with certain web sites. These issues are for example:

-          URL's which do not change, when you click on links from the front page (gaffa.dk)

-          Embedded tables (dfi.dk)

-          Sites where we need a  JavaScript interpreter for to render the pages (rehpa.dk)

All the best on behalf of the Netarchive Team

Sabine

[cid:image001.png at 01D685C8.1D97A840]
Det Kgl. Bibliotek
Royal Danish Library

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20200908/2173d05d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1103 bytes
Desc: image001.png
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20200908/2173d05d/attachment.png>


More information about the Netarchivesuite-curator mailing list