[Netarchivesuite-curator] BnF NAS update for July

sara.aubry at bnf.fr sara.aubry at bnf.fr
Wed Jul 3 10:32:10 CEST 2019

Hello Alicia,

You have to check which JobGenerator system you are using in NAS settings.
At BnF, we use the FixedDomainConfigurationCountJobGenerator which gives 
us the possibility to 
put a defined number of domains per job in snapshot harvests and another 
defined number of domains per job in focused crawls.




De :    "Pastrana García, Alicia" <alicia.pastrana at bne.es>
A :     "geraldine.camile at bnf.fr" <geraldine.camile at bnf.fr>, 
"netarchivesuite-curator at ml.sbforge.org" 
<netarchivesuite-curator at ml.sbforge.org>
Cc :    "bert.wendland at bnf.fr" <bert.wendland at bnf.fr>, 
"clara.wiatrowski at bnf.fr" <clara.wiatrowski at bnf.fr>, "DDL_DLN at bnf.fr" 
<DDL_DLN at bnf.fr>, "leslie.bellony-ext at bnf.fr" <leslie.bellony-ext at bnf.fr>
Date :  02/07/2019 13:33
Objet : Re: [Netarchivesuite-curator] BnF NAS update for July
Envoyé par :    "Netarchivesuite-curator" 
<netarchivesuite-curator-bounces at ml.sbforge.org>

Hello all,
Here is our update:
We are working in our three event crawl: European Parliament elections, 
local elections and Spanish Government elections that are still running. 
We have had a great collaboration from the different regions in the local 
elections, and we nominate over 3.700 sites.
We still don’t know when we are going to launch our broad crawl but it 
will probably be in September
This is the problem that we have that I have told you. I hope you can 
There is a huge harvest in one of our collections and we can’t crawl all 
the seeds in it. We have a problem with the division in Jobs of the 
harvest. For example, with version 5.3 we had loaded 15,000 URLs in a 
harvest and it generated 29 jobs of 620 URLs and one job with the rest .. 
When update to 5.4, it generate Jobs of 2096 URLs, which creates a local 
disk problem in the spiders because it is small. We use the same template 
as in 5.3 but we don’t know why the division is different. Do you know 
what this can be? Is there a parameter in NAS (or templates) that we can 
modify to reduce the number of URLs generated in each job?
If you have any questions about this, please do not hesitate to ask me.
Thank you!
Alicia Pastrana García
Área de Gestión del Depósito de las Publicaciones en Línea
División de Procesos y Servicios Digitales
Tfno.: 91 516 89 92
Biblioteca Nacional de España
De: Netarchivesuite-curator [
mailto:netarchivesuite-curator-bounces at ml.sbforge.org] En nombre de 
geraldine.camile at bnf.fr
Enviado el: martes, 02 de julio de 2019 11:50
Para: netarchivesuite-curator at ml.sbforge.org
CC: bert.wendland at bnf.fr; leslie.bellony-ext at bnf.fr; DDL_DLN at bnf.fr; 
clara.wiatrowski at bnf.fr
Asunto: [Netarchivesuite-curator] BnF NAS update for July
Hello all,

In March, we launched a selective project crawl for the European elections 
which is to come to an end in the coming days. 15 curators contributed to 
the nomination of over 1480 sites among which social networks (twitter 
mostly but also facebook and Youtube channels) represent the largest share 
(around 60%). Eventually, 18 weekly, 5 monthly and over 120 daily crawls 
were led. We contributed for 85 sites to the collaborative crawl launched 
by Ricardo Basilio on European elections results.

We also added our contribution to the collaborative crawl on Artificial 
intelligence (85 sites).

Best regards,
The BnF digital legal deposit team

Expositions Manuscrits de l’extrême – jusqu'au 7 juillet 2019 | 
et Le Monde en sphères – jusqu'au 21 juillet 2019 | François-Mitterrand
Avant d'imprimer, pensez à l'environnement.
Este mensaje y cualquier fichero adjunto están dirigidos únicamente a sus 
destinatarios y contiene información confidencial. Si usted ha recibido 
este correo electrónico por error, le informamos que no puede realizar 
ninguna revisión, alteración, impresión, copia, transmisión, difusión ni 
utilización alguna de este mensaje ni de cualquier fichero adjunto que 
pudiese contener. La realización de cualquiera de los actos indicados está 
expresamente prohibida por las Normas que regulan estas materias. Por todo 
ello se solicita que, en caso de existir error en la recepción de este 
mensaje, se lo notifique al remitente respondiendo a este e-mail y elimine 
el mensaje y su contenido inmediatamente. La Biblioteca Nacional de España 
se reserva las acciones legales que le correspondan en el caso de que se 
infrinja lo indicado anteriormente. The information in this e-mail and any 
attachments is confidential and it is intended for the addressee only. If 
you have received this e-mail in error, you are notified that any 
revision, amendment, print, copy, disclosure, distribution or use of the 
contents is unauthorized. Carrying out any of the above actions, is 
expressly banned by rules governing this matter. Hence we request that if 
you are not the intended recipient, please notify the sender answering 
this e-mail, and delete the message and any attachments. The National 
Library of Spain reserves itself the right to take the appropriate legal 
actions in the event of the above mentioned matter is being infringed. 
Netarchivesuite-curator mailing list
Netarchivesuite-curator at ml.sbforge.org

Expositions  Manuscrits de l’extrême  – jusqu'au 7 juillet 2019 | François-Mitterrand 
et  Le Monde en sphères  – jusqu'au 21 juillet 2019 | François-Mitterrand Avant d'imprimer, pensez à l'environnement. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-curator/attachments/20190703/f6be2322/attachment-0001.html>

More information about the Netarchivesuite-curator mailing list