[Netarchivesuite-devel] Domain name validity check
nicolas.giraud at bnf.fr
nicolas.giraud at bnf.fr
Fri Dec 18 16:03:20 CET 2009
Hi,
I am currently in the process of compiling domains and seeds to include in
out broad crawl, and I dont not quite understand why doamin names with a
"-" (not heading or trailing) are considered as invalid. The code defines
the regexp DomainUtils#DOMAINNAME_CHAR_REGEX_STRING that lists invalid
chars according to RFC3490. However domains such as
"www.editions-debaisieux.fr" are perfectly valid (and registered) and
still do not pass NAS validation... Why not simply relying on
java.net.URL, which would also avoid having to hard-code a bunch of TLDs
(some of which actually being primary domains) in the settings? I am
rather perplex, this validation is actually causing me a lot of trouble.
Would it be ok to change the code?
Regards,
Nicolas
Avant d'imprimer, pensez à l'environnement.
Consider the environment before printing this mail.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-devel/attachments/20091218/5ab4ab99/attachment-0002.html>
More information about the Netarchivesuite-devel
mailing list