<font size=2 face="sans-serif">Regarding the limit, if you talked to Tue
about crawler configuration, then you're probably ok.</font><br><font size=2 face="sans-serif">Generation of all snaphot jobs takes
over the generation of selective ones.</font><br><font size=2 face="sans-serif">At BnF, before we launch the broad crawl,
we make sure our daily crawls have started because the whole generation
for about 1000 jobs takes between 4 and 5 hours.</font><br><font size=2 face="sans-serif">If you do have an available snaphot
harvest controller truly available (with no grey dot), then the second
job should start. </font><br><font size=2 face="sans-serif">Common problems (at least some we encountered)
are:</font><br><font size=2 face="sans-serif">- acces problem to the arc repository
</font><br><font size=2 face="sans-serif">- unwanted characters in seed lists
causing the desactivation of the harvest definition</font><br><font size=2 face="sans-serif">- broker out of memory</font><br><br><font size=2 face="sans-serif">Sara</font><br><br><br><br><br><font size=1 color=#5f5f5f face="sans-serif">De :
</font><font size=1 face="sans-serif">"Peter Svanberg"
<Peter.Svanberg@kb.se></font><br><font size=1 color=#5f5f5f face="sans-serif">A :
</font><font size=1 face="sans-serif">"netarchivesuite-users@ml.sbforge.org"
<netarchivesuite-users@ml.sbforge.org></font><br><font size=1 color=#5f5f5f face="sans-serif">Date :
</font><font size=1 face="sans-serif">13/09/2019 18:44</font><br><font size=1 color=#5f5f5f face="sans-serif">Objet :
</font><font size=1 face="sans-serif">Re: [Netarchivesuite-users]
NAS broad crawl questions</font><br><font size=1 color=#5f5f5f face="sans-serif">Envoyé par :
</font><font size=1 face="sans-serif">"NetarchiveSuite-users"
<netarchivesuite-users-bounces@ml.sbforge.org></font><br><hr noshade><br><br><br><font size=3>10000 is what the default limits give. Should we change
that?</font><br><br><font size=3>One job started and ended but next snapshot job didn’t
start. That’s what is strange.</font><br><br><font size=3>Then later no selected job is started either. Everything
seems to have stopped/paused, except snapshot job creation.</font><br><br><font size=3>We will dig further in logs etc.</font><br><br><font size=3>/Peter</font><br><font size=3><br>13 sep. 2019 kl. 18:15 skrev "</font><a href=mailto:sara.aubry@bnf.fr><font size=3 color=blue><u>sara.aubry@bnf.fr</u></font></a><font size=3>"
<</font><a href=mailto:sara.aubry@bnf.fr><font size=3 color=blue><u>sara.aubry@bnf.fr</u></font></a><font size=3>>:<br></font><br><font size=2 face="sans-serif">Hello Peter,</font><font size=3><br></font><font size=2 face="sans-serif"><br>That's great news, just the start of a big aventure!<br>About everything should happen during the first broad crawl!</font><font size=3><br></font><font size=2 face="sans-serif"><br>10 000 domains per job is quite big, we do only 5 000 but you probably
have big crawlers.</font><font size=3><br></font><font size=2 face="sans-serif"><br>If you only had a single crawler started on the Snaphsot channel, that's
normal that only one job started.<br>That's very cautious. We also do this to make sure that we don't fail about
1000 jobs in a row...</font><font size=3><br></font><font size=2 face="sans-serif"><br>Grey dot with no hostname means that your job is over and being post-processed
with data transferred to the arc repository.<br>To check on this, look at the end of your HarvesController log file.<br>If everything went well, you can start another crawler, see if you are
crawling well, and then launch your other crawlers.</font><font size=3><br></font><font size=2 face="sans-serif"><br>Job generation can be quite long.</font><font size=3><br></font><font size=2 face="sans-serif"><br>Best,<br><br>Sara</font><font size=3><br><br><br><br><br></font><font size=1 color=#5f5f5f face="sans-serif"><br>De : </font><font size=1 face="sans-serif">"Peter
Svanberg" <</font><a href=mailto:Peter.Svanberg@kb.se><font size=1 color=blue face="sans-serif"><u>Peter.Svanberg@kb.se</u></font></a><font size=1 face="sans-serif">></font><font size=1 color=#5f5f5f face="sans-serif"><br>A : </font><font size=1 face="sans-serif">"</font><a href="mailto:netarchivesuite-users@ml.sbforge.org"><font size=1 color=blue face="sans-serif"><u>netarchivesuite-users@ml.sbforge.org</u></font></a><font size=1 face="sans-serif">"
<</font><a href="mailto:netarchivesuite-users@ml.sbforge.org"><font size=1 color=blue face="sans-serif"><u>netarchivesuite-users@ml.sbforge.org</u></font></a><font size=1 face="sans-serif">></font><font size=1 color=#5f5f5f face="sans-serif"><br>Date : </font><font size=1 face="sans-serif">13/09/2019
18:03</font><font size=1 color=#5f5f5f face="sans-serif"><br>Objet : </font><font size=1 face="sans-serif">[Netarchivesuite-users]
NAS broad crawl questions</font><font size=1 color=#5f5f5f face="sans-serif"><br>Envoyé par : </font><font size=1 face="sans-serif">"NetarchiveSuite-users"
<</font><a href="mailto:netarchivesuite-users-bounces@ml.sbforge.org"><font size=1 color=blue face="sans-serif"><u>netarchivesuite-users-bounces@ml.sbforge.org</u></font></a><font size=1 face="sans-serif">></font><font size=3><br></font><hr noshade><font size=3><br><br></font><font size=3 face="Calibri"><br>This Wednesday at 11:02 we started our first NAS broad crawl, tadaa! (Pär
has pictures showing Thomas and I pressing the mouse button, clicking on
“Activate”.)<br> <br>It started well, with the job creation process. The first job, though,
contained only one domain – maybe because it was special, with lots of
non-default seeds. Then there was job two, containing 9999 domains, and
then the process continued, with 10000 domains in each job.<br> <br>After that, the first snapshot job started running. But after it was finished,
no more snapshot jobs was started.<br> <br>Later, our selective harvests started and run as scheduled. But when they
were finished, nothing seems to happen in the job finishing and job starting
area. The “All Running Jobs” page just contains job rows with a grey
dot (crawl finished) and no host name. But the job creation process continues,
with now soon 100 jobs with 10000 domains each.<br> <br>1) Do you have any hints on what could have happened? Is
the admin host so occupied with job creation that it can’t handle anything
else? But it wasn’t during the first hours. Where could we look to find
out what could be wrong? (In log files, of course, but what should we look
for?)<br> <br>We will let the job creation be finished (which will happen approximately
Sunday after 18) and see what then happens.<br> <br>Then, concerning starting a broad crawl:<br> <br>2) We were advised to just have one harvester process running
when the snapshot harvest is activated, which we did. But when could more
processes be started? After the first snapshot job is started? Or should
we wait until all jobs are created?<br> <br>Regards,<br> </font><font size=3 face="Arial"><br>-----<br><br>Peter Svanberg<br>Technical officer<br>Digital Collections Department, Newspapers, Radio and Television Division<br><br>National Library of Sweden<br>PO Box 5039 <br>SE-104 51 Stockholm<br>Visits: Karlavägen 100, Stockholm <br>Phone: +46 10 709 32 78<br><br>E-mail</font><font size=3 face="Calibri">: </font><a href=mailto:peter.svanberg@kb.se><font size=3 color=blue face="Arial"><u>peter.svanberg@kb.se</u></font></a><font size=3 face="Arial"><br>Web</font><font size=3 face="Calibri">: </font><a href=www.kb.se><font size=3 color=blue face="Arial"><u>www.kb.se</u></font></a><font size=3><br></font><font size=3 face="Calibri"><br> <br> </font><tt><font size=2>_______________________________________________<br>NetarchiveSuite-users mailing list</font></tt><tt><font size=2 color=blue><u><br></u></font></tt><a href="mailto:NetarchiveSuite-users@ml.sbforge.org"><tt><font size=2 color=blue><u>NetarchiveSuite-users@ml.sbforge.org</u></font></tt></a><font size=3 color=blue><u><br></u></font><a href="https://ml.sbforge.org/mailman/listinfo/netarchivesuite-users"><tt><font size=2 color=blue><u>https://ml.sbforge.org/mailman/listinfo/netarchivesuite-users</u></font></tt></a><font size=3><br></font><font size=3 face="sans-serif"><br></font><hr><p><a href="https://www.bnf.fr/fr/actualites/journees-europeennes-du-patrimoine-2019"><font size=3 color=blue face="sans-serif"><b><i><u>Journées
européennes du patrimoine 2019</u></i></b></font></a><font size=3 face="sans-serif">- Samedi 21 et dimanche 22 septembre sur les sites de la BnF</font><p><font size=3 color=#008000 face="sans-serif"><b>Avant d'imprimer, pensez
à l'environnement.</b></font><p><font size=3>_______________________________________________<br>NetarchiveSuite-users mailing list</font><font size=3 color=blue><u><br></u></font><a href="mailto:NetarchiveSuite-users@ml.sbforge.org"><font size=3 color=blue><u>NetarchiveSuite-users@ml.sbforge.org</u></font></a><font size=3 color=blue><u><br></u></font><a href="https://ml.sbforge.org/mailman/listinfo/netarchivesuite-users"><font size=3 color=blue><u>https://ml.sbforge.org/mailman/listinfo/netarchivesuite-users</u></font></a><tt><font size=2>_______________________________________________<br>NetarchiveSuite-users mailing list<br>NetarchiveSuite-users@ml.sbforge.org<br></font></tt><a href="https://ml.sbforge.org/mailman/listinfo/netarchivesuite-users"><tt><font size=2>https://ml.sbforge.org/mailman/listinfo/netarchivesuite-users</font></tt></a><tt><font size=2><br></font></tt><p><font face="sans-serif"><hr />
<p><strong><em><a href="https://www.bnf.fr/fr/actualites/journees-europeennes-du-patrimoine-2019">Journées européennes du patrimoine 2019</a></em></strong> - Samedi 21 et dimanche 22 septembre sur les sites de la BnF</p>
<p style="color:#008000"><strong>Avant d'imprimer, pensez à l'environnement.</strong></p></font>