[Netarchivesuite-curator] BnF NAS update for June
geraldine.camile at bnf.fr
geraldine.camile at bnf.fr
Fri Jun 2 11:04:57 CEST 2017
Hello all,
This month we are starting work again on full-text indexing, having
developed a prototype and an experimental interface "Archives de
l'internet Labs" in 2015 and 2016. At that time we used Solr along with
tools from Netsearch and Web Archive Discovery to index our oldest
archives, from the period 1996-2000.
The main objective for this year is to index our daily news crawl from its
start in late 2010 until the end of 2016. As part of this work we will be
aiming to improve our indexing process, in particular in terms of the
algorithms applied to the text to improve the relevance of results for
users. We will also be working on the interface to make it more modular,
both to make it easier to include new collections in the future, and also
to enable us to use the search function in our main access interface,
while other functions (saved searches, corpus creation) will be maintained
in the experimental Labs interface for the moment.
In relation to these developments, we hope to work with a team of
researchers in linguistics who are studying the creation and use of
neologisms in French. If the project goes ahead, they will use the news
crawl in their work, and will bring their expertise to improving the text
processing algorithms in our indexing process.
Best regards,
The BnF digital legal deposit team
Expositions :
Le monde selon Topor - jusqu'au 16 juillet 2017 - BnF - François-Mitterrand
La bibliothèque, la nuit – Bibliothèques mythiques en réalité virtuelle - jusqu'au 13 août 2017 - BnF - François-Mitterrand Avant d'imprimer, pensez à l'environnement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20170602/e3c7eb72/attachment.html>
More information about the Netarchivesuite-curator
mailing list