[Netarchivesuite-curator] BnF NAS update for March
peter.stirling at bnf.fr
peter.stirling at bnf.fr
Tue Mar 12 15:00:07 CET 2013
Hello,
The big news here for March is that we have started transferring our web
archives into the BnF digital repository, SPAR, which will ensure the
long-term preservation of our collections. We have started with the
current crawls, but we will be progressively loading the retrospective
collections simultaneously with the ongoing crawls, starting with the most
recent collections (those harvested with NAS) and working our way back to
the historical collections from 1996. It will take at least several months
and possibly up to a few years to complete the transfer of all our
collections.
The ingest into SPAR is closely linked to the functioning of NAS : in
addition to the crawled data produced by Heritrix, SPAR will also preserve
the metadata ARC files produced by NAS, containing the configurations,
reports and logs that describe the crawls. This allows SPAR to create
coherent collections of data using three levels: the ARC, the crawl job
(containing ARCs of both data and metadata) and the harvest definition
(containing the jobs). The data model of SPAR is thus based on that of
NAS, but will be applied also to previous kinds of crawls (such as
standalone Heritrix crawls performed by the BnF, broad crawls by Internet
Archive and historical collections extracted by IA).
As well as ingesting all our existing collections, work will continue on
SPAR to allow it to handle WARC files, as this is a necessary step before
we can transfer our harvesting workflow to the production of WARCs.
Best regards,
The BnF digital legal deposit team
Exposition Salah Stétié, manuscrits et livres d'artistes - du 5 mars au 14 avril 2013 - BnF - François-Mitterrand / Galerie des donateurs Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20130312/53cad019/attachment.html>
More information about the Netarchivesuite-curator
mailing list