[Netarchivesuite-users] Problems with non-fetched image resolutions (srcset data, responsive pages)

Peter Svanberg Peter.Svanberg at kb.se
Wed Aug 23 10:49:59 CEST 2023


Hello! (sending this to Slack also)

We have problems with harvested responsive pages not having all image resolutions to show.

This was supposedly solved in Heritrix issue #477<https://github.com/internetarchive/heritrix3/issues/477> and #488<https://github.com/internetarchive/heritrix3/issues/478>, which seem to be present in the 3.4.0-NAS-7.4.3 version in NAS. (There are several 3.4.0 releases, but according to the dates in the jar files it is probably 3.4.0-20220727. Is the full release name stated somewhere?)

We get outlink lines in the warc like the description in #477<https://github.com/internetarchive/heritrix3/issues/477>: (this is one long line)

outlink: https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=551&q=80&upscale=true&w=980&s=c1baa7b31d1bfa8fbb1080da654cb82e48cc513d%20980w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=506&q=80&upscale=true&w=900&s=5cf1df8749c0482e2f7122ba36fa30c29ff5895e%20900w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=450&q=80&upscale=true&w=800&s=7a48dafd5bbd885311c40dc02ab942aeb6999f02%20800w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=394&q=80&upscale=true&w=700&s=5474c37ad3f95dd4a6659f1554ce6ee0722a941a%20700w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=338&q=80&upscale=true&w=600&s=6d93ce794d8d7a1d8006d8ee6b95c3ab8324fc5a%20600w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=281&q=80&upscale=true&w=500&s=125437d7fbf93241a2d43f4f992aefe98efc524b%20500w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=225&q=80&upscale=true&w=400&s=5193e3067db868c1852fa0f3960d92118ab41a25%20400w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=169&q=80&upscale=true&w=300&s=dd7d57437c2c130f6dfa47b3b2c0eb7fdf859a32%20300w E source/@srcset

So, wrong ExtractorHTML code in the jar file or bug still not solved? (Or have I misunderstood?)

Side track: Are there not-too-hard ways to handle all existing warc files with srcset-pages where most resolutions are missing? Could Pywb use the resolutions available, instead of showing nothing?

(What happens now is that if the image for the current web browser window size is missing, no image is shown. If you make your browser window smaller, the image may suddenly show up. This is when there is just one image. Seems more complex when there are several.)

Regards,

[KB Logo]<https://www.kb.se/>

Peter Svanberg
Technical officer
Aquisitions and Metadata Department
Film, Games, Sheet Music and Web Unit

National Library of Sweden
PO Box 5039, SE-102 41 Stockholm
Visits: Karlavägen 96, Stockholm
+46 10-709 32 78
Peter.Svanberg at kb.se<mailto:Peter.Svanberg at kb.se>
www.kb.se<https://www.kb.se/>



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://ml.sbforge.org/pipermail/netarchivesuite-users/attachments/20230823/bb91bd64/attachment.html>


More information about the NetarchiveSuite-users mailing list