[Netarchivesuite-curator] BnF NAS update for July

Pastrana García, Alicia alicia.pastrana at bne.es
Tue Jul 2 13:32:56 CEST 2019

Hello all,

Here is our update:

We are working in our three event crawl: European Parliament elections, local elections and Spanish Government elections that are still running. We have had a great collaboration from the different regions in the local elections, and we nominate over 3.700 sites.

We still don’t know when we are going to launch our broad crawl but it will probably be in September

This is the problem that we have that I have told you. I hope you can understand:

There is a huge harvest in one of our collections and we can’t crawl all the seeds in it. We have a problem with the division in Jobs of the harvest. For example, with version 5.3 we had loaded 15,000 URLs in a harvest and it generated 29 jobs of 620 URLs and one job with the rest .. When update to 5.4, it generate Jobs of 2096 URLs, which creates a local disk problem in the spiders because it is small. We use the same template as in 5.3 but we don’t know why the division is different. Do you know what this can be? Is there a parameter in NAS (or templates) that we can modify to reduce the number of URLs generated in each job?

If you have any questions about this, please do not hesitate to ask me.

Thank you!

Alicia Pastrana García
Área de Gestión del Depósito de las Publicaciones en Línea
División de Procesos y Servicios Digitales
Tfno.: 91 516 89 92
Biblioteca Nacional de España

De: Netarchivesuite-curator [mailto:netarchivesuite-curator-bounces at ml.sbforge.org] En nombre de geraldine.camile at bnf.fr
Enviado el: martes, 02 de julio de 2019 11:50
Para: netarchivesuite-curator at ml.sbforge.org
CC: bert.wendland at bnf.fr; leslie.bellony-ext at bnf.fr; DDL_DLN at bnf.fr; clara.wiatrowski at bnf.fr
Asunto: [Netarchivesuite-curator] BnF NAS update for July

Hello all,

In March, we launched a selective project crawl for the European elections which is to come to an end in the coming days. 15 curators contributed to the nomination of over 1480 sites among which social networks (twitter mostly but also facebook and Youtube channels) represent the largest share (around 60%). Eventually, 18 weekly, 5 monthly and over 120 daily crawls were led. We contributed for 85 sites to the collaborative crawl launched by Ricardo Basilio on European elections results.

We also added our contribution to the collaborative crawl on Artificial intelligence (85 sites).

Best regards,
The BnF digital legal deposit team

Expositions Manuscrits de l’extrême <https://www.bnf.fr/fr/agenda/manuscrits-de-lextreme> – jusqu'au 7 juillet 2019 | François-Mitterrand
et Le Monde en sphères<https://www.bnf.fr/fr/agenda/le-monde-en-spheres> – jusqu'au 21 juillet 2019 | François-Mitterrand

Avant d'imprimer, pensez à l'environnement.

Este mensaje y cualquier fichero adjunto están dirigidos únicamente a sus destinatarios y contiene información confidencial. Si usted ha recibido este correo electrónico por error, le informamos que no puede realizar ninguna revisión, alteración, impresión, copia, transmisión, difusión ni utilización alguna de este mensaje ni de cualquier fichero adjunto que pudiese contener. La realización de cualquiera de los actos indicados está expresamente prohibida por las Normas que regulan estas materias. Por todo ello se solicita que, en caso de existir error en la recepción de este mensaje, se lo notifique al remitente respondiendo a este e-mail y elimine el mensaje y su contenido inmediatamente. La Biblioteca Nacional de España se reserva las acciones legales que le correspondan en el caso de que se infrinja lo indicado anteriormente.
The information in this e-mail and any attachments is confidential and it is intended for the addressee only. If you have received this e-mail in error, you are notified that any revision, amendment, print, copy, disclosure, distribution or use of the contents is unauthorized. Carrying out any of the above actions, is expressly banned by rules governing this matter. Hence we request that if you are not the intended recipient, please notify the sender answering this e-mail, and delete the message and any attachments. The National Library of Spain reserves itself the right to take the appropriate legal actions in the event of the above mentioned matter is being infringed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20190702/0caa51d7/attachment.html>

More information about the Netarchivesuite-curator mailing list