[Netarchivesuite-devel] Domain name validity check

Søren Vejrup Carlsen svc at kb.dk
Fri Dec 18 16:16:48 CET 2009


Hi Nicolas.
The regexp DomainUtils#DOMAINNAME_CHAR_REGEX_STRING is actually defining the valid domainnames despite the heading "Invalid .."
So editions-debaisieux.fr should be considered a valid domain
www.editions-debaisieux.fr<http://www.editions-debaisieux.fr> is not a domain, but a host under the domain www.editions-debaisieux.fr<http://www.editions-debaisieux.fr>

regards
Søren
---------------------------------------------------------------------------
Søren Vejrup Carlsen, NetarchiveSuite developer (and QA)
Department of Digital Preservation, Royal Library, Copenhagen, Denmark
tlf: (+45) 33 47 48 41
email: svc at kb.dk<mailto:svc at kb.dk>
----------------------------------------------------------------------------
Non omnia possumus omnes
--- Macrobius, Saturnalia, VI, 1, 35 -------



Fra: netarchivesuite-devel-bounces at lists.gforge.statsbiblioteket.dk [mailto:netarchivesuite-devel-bounces at lists.gforge.statsbiblioteket.dk] På vegne af nicolas.giraud at bnf.fr
Sendt: 18. december 2009 16:03
Til: netarchivesuite-devel at lists.gforge.statsbiblioteket.dk
Emne: [Netarchivesuite-devel] Domain name validity check


Hi,

I am currently in the process of compiling domains and seeds to include in out broad crawl, and I dont not quite understand why doamin names with a "-" (not heading or trailing) are considered as invalid. The code defines the regexp DomainUtils#DOMAINNAME_CHAR_REGEX_STRING that lists invalid chars according to RFC3490. However domains such as "www.editions-debaisieux.fr" are perfectly valid (and registered) and still do not pass NAS validation... Why not simply relying on java.net.URL, which would also avoid having to hard-code a bunch of TLDs (some of which actually being primary domains) in the settings? I am rather perplex, this validation is actually causing me a lot of trouble. Would it be ok to change the code?

Regards,

Nicolas


Avant d'imprimer, pensez à l'environnement.
Consider the environment before printing this mail.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ml.sbforge.org/pipermail/netarchivesuite-devel/attachments/20091218/2e4cba3d/attachment-0002.html>


More information about the Netarchivesuite-devel mailing list