[Netarchivesuite-devel] Domain name validity check

nicolas.giraud at bnf.fr nicolas.giraud at bnf.fr
Fri Dec 18 16:03:20 CET 2009


Hi,

I am currently in the process of compiling domains and seeds to include in 
out broad crawl, and I dont not quite understand why doamin names with a 
"-" (not heading or trailing) are considered as invalid. The code defines 
the regexp DomainUtils#DOMAINNAME_CHAR_REGEX_STRING that lists invalid 
chars according to RFC3490. However domains such as 
"www.editions-debaisieux.fr" are perfectly valid (and registered) and 
still do not pass NAS validation... Why not simply relying on 
java.net.URL, which would also avoid having to hard-code a bunch of TLDs 
(some of which actually being primary domains) in the settings? I am 
rather perplex, this validation is actually causing me a lot of trouble. 
Would it be ok to change the code?

Regards,

Nicolas



Avant d'imprimer, pensez à l'environnement. 
Consider the environment before printing this mail.   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-devel/attachments/20091218/5ab4ab99/attachment-0002.html>


More information about the Netarchivesuite-devel mailing list