[Netarchivesuite-curator] BnF NAS update for September

peter.stirling at bnf.fr peter.stirling at bnf.fr
Fri Sep 11 15:07:07 CEST 2015

Hello all,

First of all, we're pleased to announce that Marie Chouleur arrived on the 
1st September to take up the post of head of digital legal deposit at the 

During July and August, we have performed technical tests and started a 
trial run for the broad crawl:
- we have tried to include IDNs: it was necessary to rewrite some of them 
in correct UTF8 syntax, but this did not work with NAS and Heritrix. So we 
will have to wait for Heritrix 3 to crawl these specific domains.
- the new storage array was delivered and different kinds of 
configurations were tried before finding the right one. There were some 
communication problems between the crawlers and the array.
- we changed the operating system of the servers from CentOS 5 to CentOS 
6, which turned out to be a lot of work. At first, we put CentOS 6 on the 
crawlers but access to the indexer was much less powerful than under the 
old system. The consequence was that each job almost stopped working after 
two or three hours. We tried several configurations before we eventually 
moved the index to an external nfs server on the storage array.

Right now, our engineers have solved these problems and are doing some 
final tests. We hope to be able to start the real crawl mid-september.

Best regards,
The BnF digital legal deposit team

Entrez dans l'Histoire des rois de France en participant à l'acquisition du bréviaire royal de Saint-Louis de Poissy Avant d'imprimer, pensez à l'environnement. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20150911/a4f2cf82/attachment.html>

More information about the Netarchivesuite-curator mailing list