<font size=2 face="sans-serif">Hi Peter,</font><br><br><font size=3 color=#004080 face="Calibri">>How do you distribute
this file to each job directory?</font><br><font size=3 color=#004080 face="Calibri">>In Sara’s example there
was an absolute path, </font><font size=3 face="Courier New">/dlweb/data/nas/exclude.txt</font><font size=3 color=#004080 face="Calibri">, is that a common file system available on all harvester hosts? We don’t
have that at the moment.</font><br><br><font size=2 face="sans-serif">The answer is yes. Our crawlers share
common disk spaces and </font><font size=3 face="Courier New">/dlweb/data/nas/</font><font size=2 face="sans-serif">is one of them.</font><br><font size=2 face="sans-serif">If you don't, have one, copying the
updated file to the crawler file system will also work. You can change
the content of this file while the crawl is running.</font><br><br><font size=3 color=#004080 face="Calibri"> SurtPrefixedDecideRule
</font><font size=2 face="sans-serif">is a DecideRule so I guess it depends
on your rules organization.</font><br><br><font size=2 face="sans-serif">Thanks for letting us know about the
</font><font size=3 color=#004080 face="Calibri">surtsSource</font><font size=2 face="sans-serif">.</font><br><br><font size=2 face="sans-serif">Best,</font><br><br><font size=2 face="sans-serif">Sara</font><br><br><br><br><font size=1 color=#5f5f5f face="sans-serif">De :
</font><font size=1 face="sans-serif">"Peter Svanberg"
<Peter.Svanberg@kb.se></font><br><font size=1 color=#5f5f5f face="sans-serif">A :
</font><font size=1 face="sans-serif">"netarchivesuite-users@ml.sbforge.org"
<netarchivesuite-users@ml.sbforge.org></font><br><font size=1 color=#5f5f5f face="sans-serif">Date :
</font><font size=1 face="sans-serif">15/11/2023 11:50</font><br><font size=1 color=#5f5f5f face="sans-serif">Objet :
</font><font size=1 face="sans-serif">Re: [Netarchivesuite-users]
File to exclude domains?</font><br><font size=1 color=#5f5f5f face="sans-serif">Envoyé par :
</font><font size=1 face="sans-serif">"NetarchiveSuite-users"
<netarchivesuite-users-bounces@ml.sbforge.org></font><br><hr noshade><br><br><br><font size=3 color=#004080 face="Calibri">A followup and simple(?)
question related to the answer:</font><br><font size=3 color=#004080 face="Calibri"> </font><br><font size=3 color=#004080 face="Calibri">When you specify files to
be read from a bean, specified with just a filename, no path – how do
you distribute this file to each job? I fail to find a way to do this.</font><br><font size=3 color=#004080 face="Calibri"> </font><br><font size=3 color=#004080 face="Calibri">The example was the SurtPrefixedDecideRule
bean, where the file was specified as</font><br><font size=3 face="Courier New"><property name="surtsSourceFile"
value="exclude.txt" /></font><br><font size=3 color=#004080 face="Calibri">How do you distribute this
file to each job directory?</font><br><font size=3 color=#004080 face="Calibri"> </font><br><font size=3 color=#004080 face="Calibri">In Sara’s example there
was an absolute path, </font><font size=3 face="Courier New">/dlweb/data/nas/exclude.txt</font><font size=3 color=#004080 face="Calibri">, is that a common file system available on all harvester hosts? We don’t
have that at the moment.</font><br><font size=3 color=#004080 face="Calibri"> </font><br><font size=3 color=#004080 face="Calibri">Also, I suppose the
already present bean in the standard SCOPE sequecence:</font><br><font size=3 color=#004080 face="Calibri"> <!--
...but REJECT those from a configurable (initially empty) set of REJECT
SURTs... --></font><br><font size=3 color=#004080 face="Calibri">Is the correct place for
this SurtPrefixedDecideRule bean? </font><br><font size=3 color=#004080 face="Calibri"> </font><br><font size=3 color=#004080 face="Calibri">BTW, according to Heritrix
source code, surtsSourceFile is deprecated, you should use surtsSource,
like this:</font><br><font size=3 color=#004080 face="Calibri"> </font><br><font size=3 color=#004080 face="Calibri"><property name="surtsSource"></font><br><font size=3 color=#004080 face="Calibri"> <bean class="org.archive.spring.ConfigFile"></font><br><font size=3 color=#004080 face="Calibri">
<property name="path" value="exclude.txt" /></font><br><font size=3 color=#004080 face="Calibri"> </bean></font><br><font size=3 color=#004080 face="Calibri"></property></font><br><font size=3 face="Arial"> </font><br><font size=3 face="Arial"> </font><br><font size=3 face="Arial">-----<br>Peter Svanberg</font><font size=3 color=#004080 face="Calibri"><br></font><br><font size=3 color=#004080 face="Calibri"> </font><br><font size=3 face="Calibri"><b>Från:</b> Peter Svanberg <b><br>Skickat:</b> den 20 oktober 2023 18:54<b><br>Till:</b> </font><a href="mailto:netarchivesuite-users@ml.sbforge.org"><font size=3 color=blue face="Calibri"><u>netarchivesuite-users@ml.sbforge.org</u></font></a><font size=3 face="Calibri"><b><br>Ämne:</b> SV: [Netarchivesuite-users] File to exclude domains?</font><br><font size=3 face="Times New Roman"> </font><br><font size=3 color=#004080 face="Calibri">Thank you Sara and Bert!
</font><br><font size=3 color=#004080 face="Calibri"> </font><br><font size=3 color=#004080 face="Calibri">I was under the impression
that it was some special treatment outside of the decideRule system, but
this is perfect!</font><br><font size=3 color=#004080 face="Calibri"> </font><br><font size=3 face="Arial">-----<br>Peter Sv.</font><br><font size=3 color=#004080 face="Calibri"> </font><br><font size=3 face="Calibri"><b>Från:</b> NetarchiveSuite-users <</font><a href="mailto:netarchivesuite-users-bounces@ml.sbforge.org"><font size=3 color=blue face="Calibri"><u>netarchivesuite-users-bounces@ml.sbforge.org</u></font></a><font size=3 face="Calibri">>
<b>För </b></font><a href=mailto:sara.aubry@bnf.fr><font size=3 color=blue face="Calibri"><u>sara.aubry@bnf.fr</u></font></a><font size=3 face="Calibri"><b><br>Skickat:</b> den 20 oktober 2023 09:29<b><br>Till:</b> </font><a href="mailto:netarchivesuite-users@ml.sbforge.org"><font size=3 color=blue face="Calibri"><u>netarchivesuite-users@ml.sbforge.org</u></font></a><font size=3 face="Calibri"><b><br>Ämne:</b> Re: [Netarchivesuite-users] File to exclude domains?</font><br><font size=3 face="Times New Roman"> </font><br><font size=3 face="Arial">Hello Peter,</font><font size=3 face="Times New Roman"><br></font><font size=3 face="Arial"><br>We use Heritrix exclude.txt mechanism which you can activate with the following
bean in your profile:</font><font size=3 face="Times New Roman"><br></font><font size=3 face="Courier New"><br><bean id="rejectExcludedSurts" class="org.archive.modules.deciderules.surt.SurtPrefixedDecideRule"><br> <!-- Decision value (ACCEPT, REJECT, NONE) --><br> <property name="decision" value="REJECT"
/><br> <property name="surtsSourceFile" value="/dlweb/data/nas/exclude.txt"
/><br> <property name="seedsAsSurtPrefixes" value="false"
/><br> <property name="alsoCheckVia" value="false"
/><br> <property name="surtsDumpFile" value="/dlweb/data/nas/exclude.dump"
/><br></bean></font><font size=3 face="Times New Roman"><br></font><font size=3 face="Arial"><br>Best,</font><font size=3 face="Times New Roman"><br></font><font size=3 face="Arial"><br>Sara</font><font size=3 face="Times New Roman"><br><br><br><br></font><font size=3 color=#5f5f5f face="Arial"><br>De : </font><font size=3 face="Arial">"Peter
Svanberg" <</font><a href=mailto:Peter.Svanberg@kb.se><font size=3 color=blue face="Arial"><u>Peter.Svanberg@kb.se</u></font></a><font size=3 face="Arial">></font><font size=3 color=#5f5f5f face="Arial"><br>A : </font><font size=3 face="Arial">"</font><a href="mailto:netarchivesuite-users@ml.sbforge.org"><font size=3 color=blue face="Arial"><u>netarchivesuite-users@ml.sbforge.org</u></font></a><font size=3 face="Arial">"
<</font><a href="mailto:netarchivesuite-users@ml.sbforge.org"><font size=3 color=blue face="Arial"><u>netarchivesuite-users@ml.sbforge.org</u></font></a><font size=3 face="Arial">></font><font size=3 color=#5f5f5f face="Arial"><br>Date : </font><font size=3 face="Arial">19/10/2023
21:11</font><font size=3 color=#5f5f5f face="Arial"><br>Objet : </font><font size=3 face="Arial">[Netarchivesuite-users]
File to exclude domains?</font><font size=3 color=#5f5f5f face="Arial"><br>Envoyé par : </font><font size=3 face="Arial">"NetarchiveSuite-users"
<</font><a href="mailto:netarchivesuite-users-bounces@ml.sbforge.org"><font size=3 color=blue face="Arial"><u>netarchivesuite-users-bounces@ml.sbforge.org</u></font></a><font size=3 face="Arial">></font><div align=center><hr noshade></div><br><font size=3 face="Times New Roman"><br><br></font><font size=3 face="Calibri"><br>I have a definite recollection of Sara talking about a file you can create
containing domain names to be excluded from a snapshot. But I can't find
any info on that anywhere. (Other than NAS-1725 but not what was done with
that.) Can someone remind me?<br> <br>(I know you can configure with zeros but a list in a file would be easier.)</font><font size=3 face="Arial"><br>-----<br><br>Peter Svanberg<br>National Library of Sweden</font><font size=3 face="Courier New"><br>_______________________________________________<br>NetarchiveSuite-users mailing list</font><font size=3 color=blue face="Times New Roman"><u><br></u></font><a href="mailto:NetarchiveSuite-users@ml.sbforge.org"><font size=3 color=blue face="Courier New"><u>NetarchiveSuite-users@ml.sbforge.org</u></font></a><font size=3 color=blue face="Times New Roman"><u><br></u></font><a href="https://ml.sbforge.org/mailman/listinfo/netarchivesuite-users"><font size=3 color=blue face="Courier New"><u>https://ml.sbforge.org/mailman/listinfo/netarchivesuite-users</u></font></a><div align=center><hr></div><p><font size=3 face="Arial">Expositions </font><a href="https://www.bnf.fr/fr/agenda/epreuves-de-la-matiere"><font size=3 color=blue face="Arial"><b><i><u>Épreuves
de la matière</u></i></b></font></a><font size=3 face="Arial"> du 10 octobre
2023 au 4 février 2024 et </font><a href="https://www.bnf.fr/fr/agenda/noir-blanc-une-esthetique-de-la-photographie"><font size=3 color=blue face="Arial"><b><i><u>Noir
& Blanc : une esthétique de la photographie</u></i></b></font></a><font size=3 face="Arial">du 17 octobre 2023 au 21 janvier 2024 | François-Mitterrand.</font><p><a href="https://www.bnf.fr/fr/participez-lacquisition-du-breviaire-de-charles-v"><font size=3 color=blue face="Arial"><b><u>Participez
à l’acquisition du bréviaire de Charles V, très rare manuscrit enluminé
du XIV<sup>e</sup> siècle</u></b></font></a><p><font size=3 color=#008000 face="Arial"><b>Avant d'imprimer, pensez
à l'environnement.</b></font><tt><font size=2>_______________________________________________<br>NetarchiveSuite-users mailing list<br>NetarchiveSuite-users@ml.sbforge.org<br></font></tt><a href="https://ml.sbforge.org/mailman/listinfo/netarchivesuite-users"><tt><font size=2>https://ml.sbforge.org/mailman/listinfo/netarchivesuite-users</font></tt></a><tt><font size=2><br></font></tt><p><font face="sans-serif"><hr />
<p>Expositions <strong><em><a href="https://www.bnf.fr/fr/agenda/epreuves-de-la-matiere">Épreuves de la matière</a></em></strong> du 10 octobre 2023 au 4 février 2024 et <strong><em><a href="https://www.bnf.fr/fr/agenda/noir-blanc-une-esthetique-de-la-photographie">Noir & Blanc : une esthétique de la photographie</a></em></strong> du 17 octobre 2023 au 21 janvier 2024 | François-Mitterrand.</p>
<p><a href="https://www.bnf.fr/fr/participez-lacquisition-du-breviaire-de-charles-v"><strong>Participez à l’acquisition du bréviaire de Charles V, très rare manuscrit enluminé du XIV<sup>e</sup> siècle</strong></a></p>
<p style="color:#008000"><strong>Avant d'imprimer, pensez à l'environnement.</strong></p></font>