[Netarchivesuite-users] DNS in NAS/Heritrik
Peter Svanberg
Peter.Svanberg at kb.se
Tue Nov 2 09:26:18 CET 2021
Hello!
In preparing for a new broad crawl we gather info on new domain names. Some issues around that.
1) I have a memory of reading about some of you had done some scripts concerning domains and seeds. What was that about? What around this does NAS/Heritrix not handle on its own?
2) Well, one thing we’ve found is that NAS assumes that a domain X answers on URL http://www.X, but that is not true. We found hundreds of domains which have no www.X host but answers on http://X. Maybe this should be changed in some way in NAS?
Med vänlig hälsning
[KB Logo]<https://www.kb.se/>
Peter Svanberg
Teknisk handläggare
Insamling och metadata
Insamling 1
Kungliga biblioteket
Box 5039, 102 41 Stockholm
Besöksadress: Karlavägen 96, Stockholm
+46 10 709 32 78
Peter.Svanberg at kb.se
www.kb.se<https://www.kb.se/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20211102/5995a54d/attachment.html>
More information about the NetarchiveSuite-users
mailing list