<div dir="ltr">Hi Andreas,<div><br></div><div>At BnF, we use the new extractorJS without properties</div><div><font size="2" face="sans-serif"><bean id="extractorJs" class="org.archive.modules.extractor.ExtractorJS"></font><br><font size="2" face="sans-serif"></bean></font> <br></div><div><br></div><div>Instead of the icelandic one:</div><div><font size="2" face="sans-serif"><bean id="icelandicExtractorJs" class="dk.netarkivet.harvester.harvesting.extractor.IcelandicExtractorJS"></font><br><font size="2" face="sans-serif"> Possible to define this value in NetarchiveSuite GUI </font><br><font size="2" face="sans-serif"> <property name="enabled" value="true" /></font><br><font size="2" face="sans-serif"> <property name="rejectRelativeMatchingRegexList"></font><br><font size="2" face="sans-serif"> <list></font><br><font size="2" face="sans-serif"> <value>^text/javascript$</value></font><br><font size="2" face="sans-serif"> <value>^text/css$</value></font><br><font size="2" face="sans-serif"> <value>^a\.[^/]+$</value></font><br><font size="2" face="sans-serif"> <value>^div\.[^/]+$</value></font><br><font size="2" face="sans-serif"> E.g. 3.5.0. Very common in some JS libraries for strings of this nature but very unlikely to be a relative URL</font><br><font size="2" face="sans-serif"> <value>^[0-9]\.([0-9]\.)[0-9]$</value> </font><br><font size="2" face="sans-serif"> <value>^Microsoft\.XMLHTTP$</value> </font><br><font size="2" face="sans-serif"> </list></font><br><font size="2" face="sans-serif"> </property></font><br><font size="2" face="sans-serif"> </bean></font> <br></div><div><br></div><div>I don't know if the extractorJs has rejectRelativeMatchingRegexList property.</div><div><br></div><div>Best,</div><div>Clara</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le mer. 8 juil. 2020 à 10:04, <<a href="mailto:aponb@gmx.at">aponb@gmx.at</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Getting back to yesterdays call. The suggestion with NAS 6.0 is
now going back to the default ExtractorJS. So using <br>
</p>
<pre><bean id="extractorJs" class="org.archive.modules.extractor.ExtractorJS">
</bean>
</pre>
<p>instead of</p>
<pre><bean id="icelandicExtractorJs" class="dk.netarkivet.harvester.harvesting.extractor.IcelandicExtractorJS">
</bean>
</pre>
<p>Does the default Extractor have the same Properties as the
IcelandicExtractor like the rejectRelativeMatchingRegexList?<br>
</p>
<p>This would be a perfect sample for a NAS-Knowledge-DB!</p>
<p>Regards</p>
<p>a.<br>
</p>
<pre></pre>
</div>
_______________________________________________<br>
Netarchivesuite-devel mailing list<br>
<a href="mailto:Netarchivesuite-devel@ml.sbforge.org" target="_blank">Netarchivesuite-devel@ml.sbforge.org</a><br>
<a href="https://ml.sbforge.org/mailman/listinfo/netarchivesuite-devel" rel="noreferrer" target="_blank">https://ml.sbforge.org/mailman/listinfo/netarchivesuite-devel</a><br>
</blockquote></div>