[Netarchivesuite-curator] Netarchive NAS update for August
Sabine Schostag
sas at statsbiblioteket.dk
Wed Aug 3 16:20:46 CEST 2016
Hi all,
Here follows an update from KB/SB:
We are still working on the reorganization of the selective crawls.
Following our new collection strategy – extension of the selective crawls and smaller broad crawls – we now collect all national Danish news media selectively – both newspaper websites and news media only existing online.
We investigate all local new media in order to decide frequency and depth for the future crawls.
As Heritrix 3 is not able to archive Facebook profiles. But Archive-IT is able to collect Facebook profiles with an API. We will use We will collect about 100 representative open Facebook profiles at Archive-IT, at the moment we are doing the selection of the profiles
All the best, Sabine
Sabine Schostag
Web curator
NETARCHIVE
[cid:image001.png at 01CF5E4A.E0F00190]STATE AND UNIVERSITY LIBRARY
Victor Albecks Vej 1
8000 AARHUS C
DENMARK
VAT NO. 1010 0682
___________________________________________
http://netarkivet.dk/in-english/
From: Netarchivesuite-curator [mailto:netarchivesuite-curator-bounces at ml.sbforge.org] On Behalf Of peter.stirling at bnf.fr
Sent: Wednesday, August 03, 2016 2:00 PM
To: netarchivesuite-curator at ml.sbforge.org
Subject: [Netarchivesuite-curator] BnF NAS update for August
Hello all,
As in previous years, in July we started to work on our annual broad crawl. We have asked for seed lists from our different partners (registrars, registers and BnF databases). In 2016, we have managed to expand the number of domains from TLDs for overseas French departments (.gf, .gp, .mq, .pf) and from regional TLDs (.alsace, .bzh, .paris): at this point, we have more than 36,000 domain names from these TLDs.
In July and August, we are using our tool nas-preload to deduplicate URLs and domains from the seven different identified sources and to check the DNS response, with the aim of transferring only active domains into NAS.
We have just started to work on the migration from Heritrix 1 to Heritrix 3. We plan to achieve the first stage of non-regression at the end of February 2017. It is a big challenge as we also have to adjust other tools connected to Heritrix 3.
Best regards,
The BnF Digital Legal Deposit team
________________________________
Expositions :
Miquel Barcel?. Sol y sombra<http://www.bnf.fr/fr/evenements_et_culture/anx_expositions/f.miquel_Barcelo.html> - du 22 mars 2016 au 28 ao?t 2016 - BnF - Fran?ois-Mitterrand
Avant d'imprimer, pensez ? l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20160803/eb710d24/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 584 bytes
Desc: image001.png
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20160803/eb710d24/attachment.png>
More information about the Netarchivesuite-curator
mailing list