<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:Mangal;
panose-1:0 0 4 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
tt
{mso-style-priority:99;
font-family:"Courier New";}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
span.E-postmall20
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.E-postmall21
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="SV" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">A followup and simple(?) question related to the answer</span><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">When you specify files to be read from a bean, specified with just a filename, no path – how do you distribute this file
to each job? I fail to find a way to do this.</span><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">The example was the SurtPrefixedDecideRule bean, where the file was specified as<o:p></o:p></span></p>
<p class="MsoNormal" style="text-indent:65.2pt"><span lang="EN-GB" style="font-size:10.0pt;font-family:"Courier New""><property name="surtsSourceFile" value="exclude.txt" /><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">How do you distribute this file to each job directory?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">In Sara’s example there was an absolute path,
</span><tt><span lang="EN-GB" style="font-size:10.0pt">/dlweb/data/nas/exclude.txt</span></tt><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"> , is that a common file system available on
all harvester hosts? We don’t have that at the moment.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Also, I suppose the already present bean in the standard SCOPE sequecence:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"> <!-- ...but REJECT those from a configurable (initially empty) set of REJECT SURTs... --><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Is the correct place for this SurtPrefixedDecideRule bean?
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">BTW, according to Heritrix source code, surtsSourceFile is deprecated, you should use surtsSource, like this:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><property name="surtsSource"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"> <bean class="org.archive.spring.ConfigFile"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"> <property name="path" value="exclude.txt" /><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"> </bean><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"></property><o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:9.0pt;font-family:"Arial",sans-serif;color:black"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:9.0pt;font-family:"Arial",sans-serif;color:black"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Arial",sans-serif;color:black">-----<br>
</span><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:black">Peter Svanberg</span><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><br>
<br>
</span><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Från:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Peter Svanberg
<br>
<b>Skickat:</b> den 20 oktober 2023 18:54<br>
<b>Till:</b> <a href="mailto:netarchivesuite-users@ml.sbforge.org">netarchivesuite-users@ml.sbforge.org</a><br>
<b>Ämne:</b> SV: [Netarchivesuite-users] File to exclude domains?<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Thank you Sara and Bert!
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US">I was under the impression that it was some special treatment outside of the decideRule system, but this is perfect!<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span lang="EN-GB" style="font-size:9.0pt;font-family:"Arial",sans-serif;color:black">-----<br>
</span><span lang="EN-GB" style="font-size:10.0pt;font-family:"Arial",sans-serif;color:black">Peter Sv.</span><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif">Från:</span></b><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif"> NetarchiveSuite-users <</span><a href="mailto:netarchivesuite-users-bounces@ml.sbforge.org"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif">netarchivesuite-users-bounces@ml.sbforge.org</span></a><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif">>
<b>För </b></span><a href="mailto:sara.aubry@bnf.fr"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif">sara.aubry@bnf.fr</span></a><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
<b>Skickat:</b> den 20 oktober 2023 09:29<br>
<b>Till:</b> </span><a href="mailto:netarchivesuite-users@ml.sbforge.org"><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif">netarchivesuite-users@ml.sbforge.org</span></a><span lang="EN-GB" style="font-size:11.0pt;font-family:"Calibri",sans-serif"><br>
<b>Ämne:</b> Re: [Netarchivesuite-users] File to exclude domains?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:"Arial",sans-serif">Hello Peter,</span><span lang="EN-GB"><br>
<br>
</span><span lang="EN-GB" style="font-size:10.0pt;font-family:"Arial",sans-serif">We use Heritrix exclude.txt mechanism which you can activate with the following bean in your profile:</span><span lang="EN-GB"><br>
<br>
</span><tt><span lang="EN-GB" style="font-size:10.0pt"><bean id="rejectExcludedSurts" class="org.archive.modules.deciderules.surt.SurtPrefixedDecideRule"></span></tt><span lang="EN-GB" style="font-family:"Courier New""><br>
</span><tt><span lang="EN-GB" style="font-size:10.0pt"> <!-- Decision value (ACCEPT, REJECT, NONE) --></span></tt><span lang="EN-GB" style="font-family:"Courier New""><br>
</span><tt><span lang="EN-GB" style="font-size:10.0pt"> <property name="decision" value="REJECT" /></span></tt><span lang="EN-GB" style="font-family:"Courier New""><br>
</span><tt><span lang="EN-GB" style="font-size:10.0pt"> <property name="surtsSourceFile" value="/dlweb/data/nas/exclude.txt" /></span></tt><span lang="EN-GB" style="font-family:"Courier New""><br>
</span><tt><span lang="EN-GB" style="font-size:10.0pt"> <property name="seedsAsSurtPrefixes" value="false" /></span></tt><span lang="EN-GB" style="font-family:"Courier New""><br>
</span><tt><span lang="EN-GB" style="font-size:10.0pt"> <property name="alsoCheckVia" value="false" /></span></tt><span lang="EN-GB" style="font-family:"Courier New""><br>
</span><tt><span lang="EN-GB" style="font-size:10.0pt"> <property name="surtsDumpFile" value="/dlweb/data/nas/exclude.dump" /></span></tt><span lang="EN-GB" style="font-family:"Courier New""><br>
</span><tt><span lang="EN-GB" style="font-size:10.0pt"></bean></span></tt><span lang="EN-GB"><br>
<br>
</span><span lang="EN-GB" style="font-size:10.0pt;font-family:"Arial",sans-serif">Best,</span><span lang="EN-GB"><br>
<br>
</span><span lang="EN-GB" style="font-size:10.0pt;font-family:"Arial",sans-serif">Sara</span><span lang="EN-GB"><br>
<br>
<br>
<br>
<br>
</span><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif;color:#5F5F5F">De : </span><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif">"Peter Svanberg" <</span><a href="mailto:Peter.Svanberg@kb.se"><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif">Peter.Svanberg@kb.se</span></a><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif">></span><span lang="EN-GB"><br>
</span><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif;color:#5F5F5F">A : </span><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif">"</span><a href="mailto:netarchivesuite-users@ml.sbforge.org"><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif">netarchivesuite-users@ml.sbforge.org</span></a><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif">"
<</span><a href="mailto:netarchivesuite-users@ml.sbforge.org"><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif">netarchivesuite-users@ml.sbforge.org</span></a><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif">></span><span lang="EN-GB"><br>
</span><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif;color:#5F5F5F">Date : </span><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif">19/10/2023 21:11</span><span lang="EN-GB"><br>
</span><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif;color:#5F5F5F">Objet : </span><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif">[Netarchivesuite-users] File to exclude domains?</span><span lang="EN-GB"><br>
</span><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif;color:#5F5F5F">Envoyé par : </span><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif">"NetarchiveSuite-users" <</span><a href="mailto:netarchivesuite-users-bounces@ml.sbforge.org"><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif">netarchivesuite-users-bounces@ml.sbforge.org</span></a><span lang="EN-GB" style="font-size:7.5pt;font-family:"Arial",sans-serif">></span><span lang="EN-GB"><o:p></o:p></span></p>
<div class="MsoNormal" align="center" style="text-align:center">
<hr size="3" width="100%" noshade="" style="color:#A0A0A0" align="center">
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span lang="EN-GB"><br>
<br>
<br>
</span><span lang="EN-GB" style="font-family:"Calibri",sans-serif">I have a definite recollection of Sara talking about a file you can create containing domain names to be excluded from a snapshot. But I can't find any info on that anywhere. (Other than NAS-1725
but not what was done with that.) Can someone remind me?</span><span lang="EN-GB"><br>
</span><span lang="EN-GB" style="font-family:"Calibri",sans-serif"> </span><span lang="EN-GB"><br>
</span><span lang="EN-GB" style="font-family:"Calibri",sans-serif">(I know you can configure with zeros but a list in a file would be easier.)</span><span lang="EN-GB"><br>
</span><span lang="EN-GB" style="font-family:"Arial",sans-serif">-----<br>
<br>
Peter Svanberg<br>
National Library of Sweden</span><span lang="EN-GB" style="font-family:"Calibri",sans-serif"><br>
</span><tt><span lang="EN-GB" style="font-size:10.0pt">_______________________________________________</span></tt><span lang="EN-GB" style="font-size:10.0pt;font-family:"Courier New""><br>
<tt>NetarchiveSuite-users mailing list</tt><br>
</span><a href="mailto:NetarchiveSuite-users@ml.sbforge.org"><span lang="EN-GB" style="font-size:10.0pt;font-family:"Courier New"">NetarchiveSuite-users@ml.sbforge.org</span></a><span lang="EN-GB" style="font-size:10.0pt;font-family:"Courier New""><br>
</span><a href="https://ml.sbforge.org/mailman/listinfo/netarchivesuite-users"><tt><span lang="EN-GB" style="font-size:10.0pt">https://ml.sbforge.org/mailman/listinfo/netarchivesuite-users</span></tt></a><span lang="EN-GB" style="font-family:"Arial",sans-serif"><o:p></o:p></span></p>
<div class="MsoNormal" align="center" style="text-align:center"><span style="font-family:"Arial",sans-serif">
<hr size="3" width="100%" align="center">
</span></div>
<p><span style="font-family:"Arial",sans-serif">Expositions </span><a href="https://www.bnf.fr/fr/agenda/epreuves-de-la-matiere"><b><i><span style="font-family:"Arial",sans-serif">Épreuves de la matière</span></i></b></a><span style="font-family:"Arial",sans-serif">
du 10 octobre 2023 au 4 février 2024 et </span><a href="https://www.bnf.fr/fr/agenda/noir-blanc-une-esthetique-de-la-photographie"><b><i><span style="font-family:"Arial",sans-serif">Noir & Blanc : une esthétique de la photographie</span></i></b></a><span style="font-family:"Arial",sans-serif">
du 17 octobre 2023 au 21 janvier 2024 | François-Mitterrand.<o:p></o:p></span></p>
<p><a href="https://www.bnf.fr/fr/participez-lacquisition-du-breviaire-de-charles-v"><strong><span style="font-family:"Arial",sans-serif">Participez à l’acquisition du bréviaire de Charles V, très rare manuscrit enluminé du XIV<sup>e</sup> siècle</span></strong></a><span style="font-family:"Arial",sans-serif"><o:p></o:p></span></p>
<p><strong><span style="font-family:"Arial",sans-serif;color:green">Avant d'imprimer, pensez à l'environnement.</span></strong><span style="font-family:"Arial",sans-serif;color:green"><o:p></o:p></span></p>
</div>
</body>
</html>