[Netarchivesuite-curator] Netarchive NAS update for July

Sabine Schostag sas at statsbiblioteket.dk
Thu Jul 30 15:03:54 CEST 2015

Hi all,

We started our second broad crawl 2015 with a limit of 10 GB per domain.

We are about to finish our event harvest of the Parliamentary elections

In the beginning of the month we ran into problems with harvesting dr.dk, the Danish public service broadcast corporation: they had changed their web site.
Harvesting some of the content probably made our system crash and we are not able to collect content from dr.dk’s https-pages. Maybe upgrading to Java 7 could be a solution for our problems.

We are still working on a best practice to hide person sensitive content from our archive.   to launch full text search. This is required before we can open our full text search to our users.


Sabine Schostag
bibliotekar - webkurator
[cid:image001.png at 01CF0AE0.DAA1C2A0]STATSBIBLIOTEKET
CVR/SE 1010 0682 – EAN 579800079108
Direkte 8946 2148
Historiske avissider, tv/radio og reklamefilm: www.mediestream.dk<http://www.mediestream.dk>

From: Netarchivesuite-curator [mailto:netarchivesuite-curator-bounces at ml.sbforge.org] On Behalf Of peter.stirling at bnf.fr
Sent: Thursday, July 09, 2015 3:43 PM
To: netarchivesuite-curator at ml.sbforge.org
Subject: [Netarchivesuite-curator] BnF NAS update for July

Hello all,

As there are regional elections in December, we will launch the broad crawl early (that means from September to November) with the aim of having enough crawlers for each project. This period is important for our activities regarding infrastructure. We are working closely with several IT teams to make sure we will have enough bandwith dedicated to web archiving (400Mb/s); we will change the operating system from CentOS 5 to CentOS 6 and may change some servers; a new storage array is arriving and will have to be installed and the workflow configured.
In terms of content, the objectives will be the same as last year.

The annual project crawl on social movements has increased in 2015 with the participation of a researcher from the University of Aix-Marseille. She is studying the way North African immigration is talked about on the web. She took part in the selection of websites, blogs, forums: in fact, 216 new URLs have been added to form a total of 515 URLs, which represent 109 different domains.  We are encouraging collaboration with researchers in the work of selection as much as possible.

Best regards,
The BnF legal deposit team

Exposition Piaf<http://www.bnf.fr/fr/evenements_et_culture/anx_expositions/f.piaf.html> - du 14 avril 2015 au 23 ao?t 2015 - BnF - Fran?ois-Mitterrand

Avant d'imprimer, pensez ? l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20150730/152da21a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 588 bytes
Desc: image001.png
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20150730/152da21a/attachment.png>

More information about the Netarchivesuite-curator mailing list