[Netarchivesuite-users] DNS in NAS/Heritrik

Peter Svanberg Peter.Svanberg at kb.se
Tue Nov 2 09:26:18 CET 2021


Hello!

In preparing for a new broad crawl we gather info on new domain names. Some issues around that.

1) I have a memory of reading about some of you had done some scripts concerning domains and seeds. What was that about? What around this does NAS/Heritrix not handle on its own?

2) Well, one thing we’ve found is that NAS assumes that a domain X answers on URL http://www.X, but that is not true. We found hundreds of domains which have no www.X host but answers on http://X. Maybe this should be changed in some way in NAS?

Med vänlig hälsning


[KB Logo]<https://www.kb.se/>

Peter Svanberg
Teknisk handläggare
Insamling och metadata
Insamling 1

Kungliga biblioteket
Box 5039, 102 41 Stockholm
Besöksadress: Karlavägen 96, Stockholm
+46 10 709 32 78
Peter.Svanberg at kb.se
www.kb.se<https://www.kb.se/>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20211102/5995a54d/attachment.html>


More information about the NetarchiveSuite-users mailing list