[Netarchivesuite-users] Problems with a seed
Bjarne Andersen
bja at kb.dk
Fri Nov 7 14:49:03 CET 2025
The curl-command gives me a 200 as well
bja at bja-linux3:~$ curl -LsI https://www.lineaverdesierraguadarrama.com/
HTTP/2 200
cache-control: private
content-length: 16857
content-type: text/html
server: Microsoft-IIS/10.0
set-cookie: ASPSESSIONIDAGCQDTSQ=FLPBDMPCLICDHIBNHPHCOIGN; secure; path=/
x-powered-by: ASP.NET
x-powered-by-plesk: PleskWin
date: Fri, 07 Nov 2025 13:47:12 GMT
could it be something with User-Agent that gives different results (just a guess)
best
Bjarne
From: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org> On Behalf Of Soleto Ruiz de Clavijo, Miguel
Sent: Friday, November 7, 2025 11:43 AM
To: 'NetarchiveSuite-users at ml.sbforge.org' <netarchivesuite-users at ml.sbforge.org>
Subject: [Netarchivesuite-users] Problems with a seed
Dear all,
I'm having trouble downloading a site with NAS. Specifically, it's this seed: https://www.lineaverdesierraguadarrama.com/
When I start the job, it returns a 404, but that URL works fine in a browser.
I ran the following tests:
curl -LsI https://www.lineaverdesierraguadarrama.com/
HTTP/2 404
content-length: 1245
content-type: text/html
server: Microsoft-IIS/10.0
x-powered-by: ASP.NET
x-powered-by-plesk: PleskWin
date: Fri, 07 Nov 2025 08:43:09 GMT
wget https://www.lineaverdesierraguadarrama.com/
--2025-11-07 09:43:16-- https://www.lineaverdesierraguadarrama.com/
Resolving www.lineaverdesierraguadarrama.com<http://www.lineaverdesierraguadarrama.com> (www.lineaverdesierraguadarrama.com<http://www.lineaverdesierraguadarrama.com>)... 195.55.124.177
Connecting to www.lineaverdesierraguadarrama.com<http://www.lineaverdesierraguadarrama.com> (www.lineaverdesierraguadarrama.com)[195.55.124.177]:443<http://www.lineaverdesierraguadarrama.com)[195.55.124.177]:443>... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16857 (16K) [text/html]
Saving to: "index.html.1"
It seems the server responds with a 404 when it receives a HEAD request.
Is there any way to configure the Heritrix template to make it use GET directly?
Thanks in advance.
Best regards.
________________________________
Este mensaje y cualquier fichero adjunto están dirigidos únicamente a sus destinatarios y contiene información confidencial. Si usted ha recibido este correo electrónico por error, le informamos que no puede realizar ninguna revisión, alteración, impresión, copia, transmisión, difusión ni utilización alguna de este mensaje ni de cualquier fichero adjunto que pudiese contener. La realización de cualquiera de los actos indicados está expresamente prohibida por las Normas que regulan estas materias. Por todo ello se solicita que, en caso de existir error en la recepción de este mensaje, se lo notifique al remitente respondiendo a este e-mail y elimine el mensaje y su contenido inmediatamente. La Biblioteca Nacional de España se reserva las acciones legales que le correspondan en el caso de que se infrinja lo indicado anteriormente.
________________________________
The information in this e-mail and any attachments is confidential and it is intended for the addressee only. If you have received this e-mail in error, you are notified that any revision, amendment, print, copy, disclosure, distribution or use of the contents is unauthorized. Carrying out any of the above actions, is expressly banned by rules governing this matter. Hence we request that if you are not the intended recipient, please notify the sender answering this e-mail, and delete the message and any attachments. The National Library of Spain reserves itself the right to take the appropriate legal actions in the event of the above mentioned matter is being infringed.
________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20251107/cff45881/attachment.html>
More information about the NetarchiveSuite-users
mailing list