[Netarchivesuite-curator] BnF NAS update for June

geraldine.camile at bnf.fr geraldine.camile at bnf.fr
Fri Jun 2 11:04:57 CEST 2017


Hello all,

This month we are starting work again on full-text indexing, having 
developed a prototype and an experimental interface "Archives de 
l'internet Labs" in 2015 and 2016. At that time we used Solr along with 
tools from Netsearch and Web Archive Discovery to index our oldest 
archives, from the period 1996-2000.

The main objective for this year is to index our daily news crawl from its 
start in late 2010 until the end of 2016. As part of this work we will be 
aiming to improve our indexing process, in particular in terms of the 
algorithms applied to the text to improve the relevance of results for 
users. We will also be working on the interface to make it more modular, 
both to make it easier to include new collections in the future, and also 
to enable us to use the search function in our main access interface, 
while other functions (saved searches, corpus creation) will be maintained 
in the experimental Labs interface for the moment.

In relation to these developments, we hope to work with a team of 
researchers in linguistics who are studying the creation and use of 
neologisms in French. If the project goes ahead, they will use the news 
crawl in their work, and will bring their expertise to improving the text 
processing algorithms in our indexing process.

Best regards,
The BnF digital legal deposit team
Expositions : 
Le monde selon Topor  - jusqu'au 16 juillet 2017 - BnF - François-Mitterrand 
La bibliothèque, la nuit – Bibliothèques mythiques en réalité virtuelle  - jusqu'au 13 août 2017 - BnF - François-Mitterrand Avant d'imprimer, pensez à l'environnement. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20170602/e3c7eb72/attachment.html>


More information about the Netarchivesuite-curator mailing list