[Netarchivesuite-users] Problems with a seed

Stephen Hunt sthu at kb.dk
Fri Nov 14 12:48:46 CET 2025


Hi



It looks like it will need to have a cookie in the request and/or response header to work.



Can you see in the WARC file if the response header from  https://www.lineaverdesierraguadarrama.com has the cookie information set?



Best regards,

Stephen

________________________________
Fra: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org<mailto:netarchivesuite-users-bounces at ml.sbforge.org>> på vegne af Bjarne Andersen <bja at kb.dk<mailto:bja at kb.dk>>
Sendt: 7. november 2025 14:49
Til: netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>
Emne: Re: [Netarchivesuite-users] Problems with a seed


The curl-command gives me a 200 as well

bja at bja-linux3:~$ curl -LsI https://www.lineaverdesierraguadarrama.com/

HTTP/2 200

cache-control: private

content-length: 16857

content-type: text/html

server: Microsoft-IIS/10.0

set-cookie: ASPSESSIONIDAGCQDTSQ=FLPBDMPCLICDHIBNHPHCOIGN; secure; path=/

x-powered-by: ASP.NET

x-powered-by-plesk: PleskWin

date: Fri, 07 Nov 2025 13:47:12 GMT



could it be something with User-Agent that gives different results (just a guess)



best

Bjarne



From: NetarchiveSuite-users <netarchivesuite-users-bounces at ml.sbforge.org<mailto:netarchivesuite-users-bounces at ml.sbforge.org>> On Behalf Of Soleto Ruiz de Clavijo, Miguel
Sent: Friday, November 7, 2025 11:43 AM
To: 'NetarchiveSuite-users at ml.sbforge.org' <netarchivesuite-users at ml.sbforge.org<mailto:netarchivesuite-users at ml.sbforge.org>>
Subject: [Netarchivesuite-users] Problems with a seed



Dear all,



I'm having trouble downloading a site with NAS. Specifically, it's this seed: https://www.lineaverdesierraguadarrama.com/
When I start the job, it returns a 404, but that URL works fine in a browser.

I ran the following tests:

curl -LsI https://www.lineaverdesierraguadarrama.com/

HTTP/2 404

content-length: 1245

content-type: text/html

server: Microsoft-IIS/10.0

x-powered-by: ASP.NET

x-powered-by-plesk: PleskWin

date: Fri, 07 Nov 2025 08:43:09 GMT





wget https://www.lineaverdesierraguadarrama.com/

--2025-11-07 09:43:16--  https://www.lineaverdesierraguadarrama.com/

Resolving www.lineaverdesierraguadarrama.com<http://www.lineaverdesierraguadarrama.com> (www.lineaverdesierraguadarrama.com<http://www.lineaverdesierraguadarrama.com>)... 195.55.124.177

Connecting to www.lineaverdesierraguadarrama.com<http://www.lineaverdesierraguadarrama.com> (www.lineaverdesierraguadarrama.com)[195.55.124.177]:443<http://www.lineaverdesierraguadarrama.com)[195.55.124.177]:443>... connected.

HTTP request sent, awaiting response... 200 OK

Length: 16857 (16K) [text/html]

Saving to: "index.html.1"

It seems the server responds with a 404 when it receives a HEAD request.
Is there any way to configure the Heritrix template to make it use GET directly?

Thanks in advance.

Best regards.



________________________________

Este mensaje y cualquier fichero adjunto están dirigidos únicamente a sus destinatarios y contiene información confidencial. Si usted ha recibido este correo electrónico por error, le informamos que no puede realizar ninguna revisión, alteración, impresión, copia, transmisión, difusión ni utilización alguna de este mensaje ni de cualquier fichero adjunto que pudiese contener. La realización de cualquiera de los actos indicados está expresamente prohibida por las Normas que regulan estas materias. Por todo ello se solicita que, en caso de existir error en la recepción de este mensaje, se lo notifique al remitente respondiendo a este e-mail y elimine el mensaje y su contenido inmediatamente. La Biblioteca Nacional de España se reserva las acciones legales que le correspondan en el caso de que se infrinja lo indicado anteriormente.

________________________________

The information in this e-mail and any attachments is confidential and it is intended for the addressee only. If you have received this e-mail in error, you are notified that any revision, amendment, print, copy, disclosure, distribution or use of the contents is unauthorized. Carrying out any of the above actions, is expressly banned by rules governing this matter. Hence we request that if you are not the intended recipient, please notify the sender answering this e-mail, and delete the message and any attachments. The National Library of Spain reserves itself the right to take the appropriate legal actions in the event of the above mentioned matter is being infringed.

________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20251114/0465d544/attachment.html>


More information about the NetarchiveSuite-users mailing list