[Netarchivesuite-curator] Netarchive NAS update for August

Mayr Michaela michaela.mayr at onb.ac.at
Tue Aug 23 13:46:50 CEST 2016


Dear all,

 
·         we have finally launched our online search interface https://webarchiv.onb.ac.at/ and would be interested in your feedback. The websites are still not accessible, but it is possible to search for versions either by URL or in our (partial) fulltext. We built a bookmarking feature which allows to save versions online and recall them at the library webarchive terminals.

·         At the moment we have ongoing selective crawls and still an event crawl about presidential elections.

 
Best regards

Michaela

 
Von: Netarchivesuite-curator [mailto:netarchivesuite-curator-bounces at ml.sbforge.org] Im Auftrag von Sabine Schostag
Gesendet: Mittwoch, 03. August 2016 16:21
An: 'peter.stirling at bnf.fr'; 'netarchivesuite-curator at ml.sbforge.org'
Betreff: Re: [Netarchivesuite-curator] Netarchive NAS update for August

 
Hi all,

 
Here follows an update from KB/SB:

 
We are still working on the reorganization of the selective crawls.

 
Following our new collection strategy – extension of the selective crawls and smaller broad crawls – we now collect all national Danish news media selectively – both newspaper websites and news media only existing online.

We investigate all local new media in order to decide frequency and depth for the future crawls.

 
As Heritrix 3 is not able to archive Facebook profiles. But Archive-IT is able to collect Facebook profiles with an API. We will use We will collect about 100 representative open Facebook profiles at Archive-IT, at the moment we are doing the selection of the profiles

 
All the best, Sabine

 
 
Sabine Schostag

Web curator

NETARCHIVE

STATE AND UNIVERSITY LIBRARY

Victor Albecks Vej 1

8000 AARHUS C

DENMARK

VAT NO. 1010 0682

___________________________________________
http://netarkivet.dk/in-english/ <http://netarkivet.dk/in-english/> 

 
From: Netarchivesuite-curator [mailto:netarchivesuite-curator-bounces at ml.sbforge.org <mailto:netarchivesuite-curator-bounces at ml.sbforge.org> ] On Behalf Of peter.stirling at bnf.fr <mailto:peter.stirling at bnf.fr> 
Sent: Wednesday, August 03, 2016 2:00 PM
To: netarchivesuite-curator at ml.sbforge.org <mailto:netarchivesuite-curator at ml.sbforge.org> 
Subject: [Netarchivesuite-curator] BnF NAS update for August

 
Hello all,

As in previous years, in July we started to work on our annual broad crawl. We have asked for seed lists from our different partners (registrars, registers and BnF databases). In 2016, we have managed to expand the number of domains from TLDs for overseas French departments (.gf, .gp, .mq, .pf) and from regional TLDs (.alsace, .bzh, .paris): at this point, we have more than 36,000 domain names from these TLDs.


In July and August, we are using our tool nas-preload to deduplicate URLs and domains from the seven different identified sources and to check the DNS response, with the aim of transferring only active domains into NAS.

We have just started to work on the migration from Heritrix 1 to Heritrix 3. We plan to achieve the first stage of non-regression at the end of February 2017. It is a big challenge as we also have to adjust other tools connected to Heritrix 3.

Best regards,
The BnF Digital Legal Deposit team

--------------------------------
Expositions :
Miquel Barcel?. Sol y sombra <http://www.bnf.fr/fr/evenements_et_culture/anx_expositions/f.miquel_Barcelo.html> - du 22 mars 2016 au 28 ao?t 2016 - BnF - Fran?ois-Mitterrand

Avant d'imprimer, pensez ? l'environnement.


_______________________________________________



Netarchivesuite-curator mailing list



Netarchivesuite-curator at ml.sbforge.org <mailto:Netarchivesuite-curator at ml.sbforge.org> 



http://ml.sbforge.org/mailman/listinfo/netarchivesuite-curator <http://ml.sbforge.org/mailman/listinfo/netarchivesuite-curator> 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20160823/d57ddebc/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 584 bytes
Desc: not available
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20160823/d57ddebc/attachment.png>


More information about the Netarchivesuite-curator mailing list