[Netarchivesuite-users] Problem with a spider

Soleto Ruiz de Clavijo, Miguel miguel.soleto at externos.bne.es
Mon Sep 20 10:43:22 CEST 2021


Thank you very much, Sara!

De: sara.aubry at bnf.fr <sara.aubry at bnf.fr>
Enviado el: lunes, 20 de septiembre de 2021 10:22
Para: Soleto Ruiz de Clavijo, Miguel <miguel.soleto at externos.bne.es>
CC: Monzón, Fernando <f.monzon at bne.es>; García Arratia, Juan Carlos <juancarlos.garcia at bne.es>; netarchivesuite-users at ml.sbforge.org
Asunto: RE: [Netarchivesuite-users] Problem with a spider

Hi Miguel,

The limit of opened files you have defined on your system is probably too low.
See https://linuxcommand.org/lc3_man_pages/ulimith.html

At BnF, we set it to unlimited for the harvest controllers (and never had any issue).
But you could at least double the number you have set.

Sara





De :        "Soleto Ruiz de Clavijo, Miguel" <miguel.soleto at externos.bne.es<mailto:miguel.soleto at externos.bne.es>>
A :        "'netarchivesuite-users at ml.sbforge.org'" <netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>>
Cc :        "García Arratia, Juan Carlos" <juancarlos.garcia at bne.es<mailto:juancarlos.garcia at bne.es>>, "Monzón, Fernando" <f.monzon at bne.es<mailto:f.monzon at bne.es>>
Date :        20/09/2021 09:42
Objet :        [Netarchivesuite-users] Problem with a spider
Envoyé par :        "NetarchiveSuite-users" <netarchivesuite-users-bounces at ml.sbforge.org<mailto:netarchivesuite-users-bounces at ml.sbforge.org>>
________________________________



Hi!
I’m Miguel Soleto, from the National Library of Spain. We have had a problem this weekend with a spider. Although the spider was working, We can’t see anything on the interface. Here is what I’ve seen on the log (heritrix3_err.log):

2021-09-20 07:02:37.245 INFO thread-21 org.archive.crawler.reporting.StatisticsTracker.writeReportFile() wrote report: /netarchive/BNE/harvester_high/71215_1632093369236/heritrix3/./jobs/71215_1632093369236/20210919231620/reports/processors-report.txt
2021-09-20 07:02:37.245 GRAVE thread-21 org.archive.crawler.reporting.StatisticsTracker.writeReportFile() Unable to write /netarchive/BNE/harvester_high/71215_1632093369236/heritrix3/./jobs/71215_1632093369236/20210919231620/reports/frontier-summary-report.txt at the end of crawl.
java.io.FileNotFoundException: /netarchive/BNE/harvester_high/71215_1632093369236/heritrix3/./jobs/71215_1632093369236/20210919231620/reports/frontier-summary-report.txt (Demasiados ficheros abiertos)
               at java.io.FileOutputStream.open0(Native Method)
               at java.io.FileOutputStream.open(FileOutputStream.java:270)
               at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
               at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
               at java.io.FileWriter.<init>(FileWriter.java:90)
               at org.archive.crawler.reporting.StatisticsTracker.writeReportFile(StatisticsTracker.java:897)
               at org.archive.crawler.reporting.StatisticsTracker.dumpReports(StatisticsTracker.java:926)
               at org.archive.crawler.reporting.StatisticsTracker.stop(StatisticsTracker.java:342)
               at org.springframework.context.support.DefaultLifecycleProcessor.doStop(DefaultLifecycleProcessor.java:236)
               at org.springframework.context.support.DefaultLifecycleProcessor.doStop(DefaultLifecycleProcessor.java:213)
               at org.springframework.context.support.DefaultLifecycleProcessor.doStop(DefaultLifecycleProcessor.java:213)
               at org.springframework.context.support.DefaultLifecycleProcessor.doStop(DefaultLifecycleProcessor.java:213)

In red “Demasiados ficheros abiertos”, which means “Too much open files”.
Does anyone have had a problem like this? Is there a way to avoid this?

Thank you all!

Best Regards.
________________________________
Este mensaje y cualquier fichero adjunto están dirigidos únicamente a sus destinatarios y contiene información confidencial. Si usted ha recibido este correo electrónico por error, le informamos que no puede realizar ninguna revisión, alteración, impresión, copia, transmisión, difusión ni utilización alguna de este mensaje ni de cualquier fichero adjunto que pudiese contener. La realización de cualquiera de los actos indicados está expresamente prohibida por las Normas que regulan estas materias. Por todo ello se solicita que, en caso de existir error en la recepción de este mensaje, se lo notifique al remitente respondiendo a este e-mail y elimine el mensaje y su contenido inmediatamente. La Biblioteca Nacional de España se reserva las acciones legales que le correspondan en el caso de que se infrinja lo indicado anteriormente.
________________________________
The information in this e-mail and any attachments is confidential and it is intended for the addressee only. If you have received this e-mail in error, you are notified that any revision, amendment, print, copy, disclosure, distribution or use of the contents is unauthorized. Carrying out any of the above actions, is expressly banned by rules governing this matter. Hence we request that if you are not the intended recipient, please notify the sender answering this e-mail, and delete the message and any attachments. The National Library of Spain reserves itself the right to take the appropriate legal actions in the event of the above mentioned matter is being infringed.
________________________________
_______________________________________________
NetarchiveSuite-users mailing list
NetarchiveSuite-users at ml.sbforge.org<mailto:NetarchiveSuite-users at ml.sbforge.org>
https://ml.sbforge.org/mailman/listinfo/netarchivesuite-users
________________________________

Découvrez toute la programmation culturelle de la rentrée à la BnF<https://www.bnf.fr/fr/agenda>
Pass BnF lecture/culture<https://www.bnf.fr/fr/pass-bnf-lecture-culture> : bibliothèques, expositions, conférences, concerts en illimité pour 15 € / an – Acheter en ligne<https://inscriptionbilletterie.bnf.fr/>

Avant d'imprimer, pensez à l'environnement.

________________________________
Este mensaje y cualquier fichero adjunto están dirigidos únicamente a sus destinatarios y contiene información confidencial. Si usted ha recibido este correo electrónico por error, le informamos que no puede realizar ninguna revisión, alteración, impresión, copia, transmisión, difusión ni utilización alguna de este mensaje ni de cualquier fichero adjunto que pudiese contener. La realización de cualquiera de los actos indicados está expresamente prohibida por las Normas que regulan estas materias. Por todo ello se solicita que, en caso de existir error en la recepción de este mensaje, se lo notifique al remitente respondiendo a este e-mail y elimine el mensaje y su contenido inmediatamente. La Biblioteca Nacional de España se reserva las acciones legales que le correspondan en el caso de que se infrinja lo indicado anteriormente.
________________________________
The information in this e-mail and any attachments is confidential and it is intended for the addressee only. If you have received this e-mail in error, you are notified that any revision, amendment, print, copy, disclosure, distribution or use of the contents is unauthorized. Carrying out any of the above actions, is expressly banned by rules governing this matter. Hence we request that if you are not the intended recipient, please notify the sender answering this e-mail, and delete the message and any attachments. The National Library of Spain reserves itself the right to take the appropriate legal actions in the event of the above mentioned matter is being infringed.
________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20210920/7b063c6b/attachment-0001.html>


More information about the NetarchiveSuite-users mailing list