[Netarchivesuite-curator] Following up on NAS curator roadmap
Sabine Schostag
sas at statsbiblioteket.dk
Wed Nov 20 15:55:33 CET 2013
Hi Annick – and all you NAS curators and developers
KB/SB curators have discussed our roadmap now, and I can still say: we agree with you. Just go one and create the issues you mentioned. Then we all can have a look at the roadmap again and make priorities (among others by putting our institutions on the issues with highest priority)
Just a comment to NASC-30: as you proposed (and we agree) I closed it, but KB/SB wants to replace it with NASC-43 – give the ability to delete or edit a harvest definition name or configuration name as long as it does not have relations to anything in NAS.
Concerning NASC-34 – upgrade to Heritrix 3, it is our highest priority , too!!
I started closing the resolved issues assigned to me.
Best,
Sabine
SABINE SCHOSTAG
LIBRARIAN, WEB CURATOR
DIRECT +45 8946 2148
THE NETARCHIVE
[cid:image002.png at 01CEE608.9E17A680]STATSBIBLIOTEKET
STATE AND UNIVERSITY LIBRARY
VICTOR ALBECKS VEJ 1
8000 AARHUS C
DENMARK
VAT NO. 1010 0682
From: annick.lefollic at bnf.fr [mailto:annick.lefollic at bnf.fr]
Sent: Friday, November 15, 2013 3:58 PM
To: Sabine Schostag
Cc: annick.lefollic at bnf.fr; clement.oury at bnf.fr; Colin Samuel Rosenthal; michaela.mayr at onb.ac.at; Mikis Seth Sørensen; nicolas.giraud at bnf.fr; svc at kb.dk; tlr at kb.dk
Subject: RE: Following up on NAS curator roadmap
Hello,
I mentionned the fact that, at BnF, we now harvest news sites (not new websites but sites concerning news) and we want to have a complete collection for documents which are on subscription only (that means PDF).
So we have to control all the wanted PDF are really harvested by a more precise QA which could be grateful automated.
all the best,
Annick Le Follic
Chargée de techniques et processus documentaires
Service du dépôt légal numérique
Département du dépôt légal
Bibliothèque nationale de France
www.bnf.fr<http://www.bnf.fr>
01 53 79 40 27
[cid:_1_0C644FEC0C644C0400523994C1257C24]
Message de : Sabine Schostag <sas at statsbiblioteket.dk<mailto:sas at statsbiblioteket.dk>>
15/11/2013 10:06
Pour
"'annick.lefollic at bnf.fr'" <annick.lefollic at bnf.fr<mailto:annick.lefollic at bnf.fr>>, "'michaela.mayr at onb.ac.at'" <michaela.mayr at onb.ac.at<mailto:michaela.mayr at onb.ac.at>>
Copie
Mikis Seth Sørensen <mss at statsbiblioteket.dk<mailto:mss at statsbiblioteket.dk>>, "'svc at kb.dk'" <svc at kb.dk<mailto:svc at kb.dk>>, Colin Samuel Rosenthal <csr at statsbiblioteket.dk<mailto:csr at statsbiblioteket.dk>>, "'nicolas.giraud at bnf.fr'" <nicolas.giraud at bnf.fr<mailto:nicolas.giraud at bnf.fr>>, "'clement.oury at bnf.fr'" <clement.oury at bnf.fr<mailto:clement.oury at bnf.fr>>, "tlr at kb.dk<mailto:tlr at kb.dk>" <tlr at kb.dk<mailto:tlr at kb.dk>>
Objet
RE: Following up on NAS curator roadmap
Hi Annick.
At SB we had a first look at BNF’s proposal – and we almost agree with you (final answer follows after Monday, where we meet with KB), but we have one question:
NASC 24 (Investigate automated QA tools) – what do you mean with “especially for subscription new sites”?
Anyway, automation of QA would be nice :)
Best,
Sabine
From: annick.lefollic at bnf.fr<mailto:annick.lefollic at bnf.fr> [mailto:annick.lefollic at bnf.fr]
Sent: Wednesday, October 30, 2013 1:55 PM
To: Sabine Schostag; michaela.mayr at onb.ac.at<mailto:michaela.mayr at onb.ac.at>
Cc: Mikis Seth Sørensen; svc at kb.dk<mailto:svc at kb.dk>; Colin Samuel Rosenthal; nicolas.giraud at bnf.fr<mailto:nicolas.giraud at bnf.fr>; clement.oury at bnf.fr<mailto:clement.oury at bnf.fr>; tlr at kb.dk<mailto:tlr at kb.dk>
Subject: Following up on NAS curator roadmap
Hello,
In June 2013, Sara sent to the curators of our three institutions a message to propose our next priorities on NAS development.
The BnF curator team has reviewed the different items listed on https://sbforge.org/jira/secure/ProjectBoard.jspa?selectedProjectId=10091 and this is the result of our review:<https://sbforge.org/jira/secure/ProjectBoard.jspa?selectedProjectId=10091>
NASC-5 (28 - 29): done
NASC-7: done
NASC-6 (18): maintain, BnF is working on it
NASC-15: cancel ?
NASC-19: done
NASC-20: maintain
NASC-22: done but to be continued with a new NASC item named "Sharing experience on WARC strategy during a workshop"
NASC-21: cancel and divide in three NASC items named
"Harvesting websites with password-protected access": how could BnF share its experience on subscription news sites (on the NAS wiki? in the workshop?)
"Harvesting audio files"
"Harvesting video files": BnF has developed Heritix modules for YouTube, Dailymotion and Vimeo, could share the code and write a documentation in English for the NAS wiki
NASC-31: cancel
NASC-32: cancel
NASC-33: cancel
NASC-23: maintain (with BnF tool, with Austrian tool)
NASC-24: maintain (especially for subscription new sites)
NASC-25: maintain
NASC-26: maintain
NASC-30: cancel
NASC-34: maintain as first priority
New item: "Give the ability to distribute crawlers in different pools associated with harvest definitions": create as first priority (BnF is currently working on it)
New item: "Ease crawl log extraction on a domain from a job to analyze HTTP responses"
New item: "Add a combined search function on the different lists of global crawler traps"
New item: "Fix the number of ARC files listed on the arcfiles-report.txt"
What do you think of these elements?
Do you have other new items?
All the best,
Annick Le Follic
Chargée de techniques et processus documentaires
Service du dépôt légal numérique
Département du dépôt légal
Bibliothèque nationale de France
www.bnf.fr<http://www.bnf.fr/>
01 53 79 40 27
________________________________
Participez à la Grande Collecte 1914-1918<http://www.bnf.fr/fr/la_bnf/anx_actu_bib/a.grande_collecte_14-18.html>
Avant d'imprimer, pensez à l'environnement.
________________________________
Participez à la Grande Collecte 1914-1918<http://www.bnf.fr/fr/la_bnf/anx_actu_bib/a.grande_collecte_14-18.html>
Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20131120/29540b35/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 584 bytes
Desc: image002.png
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20131120/29540b35/attachment-0001.png>
More information about the Netarchivesuite-curator
mailing list