<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;
        mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:#954F72;
        text-decoration:underline;}
p.msonormal0, li.msonormal0, div.msonormal0
        {mso-style-name:msonormal;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:0cm;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
span.EmailStyle18
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
span.EmailStyle19
        {mso-style-type:personal;
        font-family:"Calibri",sans-serif;
        color:#1F497D;}
span.EmailStyle20
        {mso-style-type:personal-compose;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.WordSection1
        {page:WordSection1;}
--></style>
</head>
<body lang="DA" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US" style="color:#1F497D">This is most likely more of a playback-problem than a crawl-problem I think.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:#1F497D">BUT – when it comes to crawling:
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:#1F497D">To be able to crawl multiple versions of images I would think, that you would need a more advanced crawler like browsertrix that you can exactly configure to playback pages in different “sizes” as
 far as I recall. So its almost a site by site problem to crawl such sites. Or heritrix has a “URL rewrite Extractor” (I forgot the specific name) where you can configure rules like “When you see a URL that matches this RegExp you should also crawl a URL that
 is like this pattern (based on parts of the original URL)”<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:#1F497D">AND – when it comes to playback<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:#1F497D">I guess you can set up advanced playback-rules to strip parts of URLs when playback requests 1 specific URL you can serve a URL that “looks like it” – im not familiar with the specific possibilities
 in pywb but my guess would be there are possibilities existing or mayby some kind of plugin-infrastructure where you can write your own re-write rules for playback. Again it would most likely be a site by site configuration to make that work – so a very tough
 job to do for broadcrawls on thousands of domains.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:#1F497D">The only good thing is that such advanced crawling og playback configurations could/should be shared among the community for common applications used all over the internet.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:#1F497D">Best<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:#1F497D">Bjarne<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US" style="color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span style="mso-fareast-language:DA">From:</span></b><span style="mso-fareast-language:DA"> NetarchiveSuite-users <netarchivesuite-users-bounces@ml.sbforge.org>
<b>On Behalf Of </b>Peter Svanberg<br>
<b>Sent:</b> Wednesday, August 23, 2023 10:50 AM<br>
<b>To:</b> netarchivesuite-users@ml.sbforge.org<br>
<b>Subject:</b> [Netarchivesuite-users] Problems with non-fetched image resolutions (srcset data, responsive pages)<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span lang="EN-GB">Hello! (sending this to Slack also)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB">We have problems with harvested responsive pages not having all image resolutions to show.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB">This was supposedly solved in Heritrix issue
<a href="https://github.com/internetarchive/heritrix3/issues/477">#477</a> and <a href="https://github.com/internetarchive/heritrix3/issues/478">
#488</a>, which seem to be present in the 3.4.0-NAS-7.4.3 version in NAS. (There are several 3.4.0 releases, but according to the dates in the jar files it is probably 3.4.0-20220727. Is the full release name stated somewhere?)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB">We get outlink lines in the warc like the description in
<a href="https://github.com/internetarchive/heritrix3/issues/477">#477</a>: </span>
<span lang="EN-GB" style="font-size:11.5pt;font-family:"Arial",sans-serif;color:#1D1C1D;background:#F8F8F8">(this is one long line)</span><span lang="EN-GB"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:9.0pt;font-family:"Courier New"">outlink:
<a href="https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=551&q=80&upscale=true&w=980&s=c1baa7b31d1bfa8fbb1080da654cb82e48cc513d%20980w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=506&q=80&upscale=true&w=900&s=5cf1df8749c0482e2f7122ba36fa30c29ff5895e%20900w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=450&q=80&upscale=true&w=800&s=7a48dafd5bbd885311c40dc02ab942aeb6999f02%20800w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=394&q=80&upscale=true&w=700&s=5474c37ad3f95dd4a6659f1554ce6ee0722a941a%20700w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=338&q=80&upscale=true&w=600&s=6d93ce794d8d7a1d8006d8ee6b95c3ab8324fc5a%20600w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=281&q=80&upscale=true&w=500&s=125437d7fbf93241a2d43f4f992aefe98efc524b%20500w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=225&q=80&upscale=true&w=400&s=5193e3067db868c1852fa0f3960d92118ab41a25%20400w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=169&q=80&upscale=true&w=300&s=dd7d57437c2c130f6dfa47b3b2c0eb7fdf859a32%20300w">
https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=551&q=80&upscale=true&w=980&s=c1baa7b31d1bfa8fbb1080da654cb82e48cc513d%20980w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=506&q=80&upscale=true&w=900&s=5cf1df8749c0482e2f7122ba36fa30c29ff5895e%20900w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=450&q=80&upscale=true&w=800&s=7a48dafd5bbd885311c40dc02ab942aeb6999f02%20800w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=394&q=80&upscale=true&w=700&s=5474c37ad3f95dd4a6659f1554ce6ee0722a941a%20700w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=338&q=80&upscale=true&w=600&s=6d93ce794d8d7a1d8006d8ee6b95c3ab8324fc5a%20600w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=281&q=80&upscale=true&w=500&s=125437d7fbf93241a2d43f4f992aefe98efc524b%20500w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=225&q=80&upscale=true&w=400&s=5193e3067db868c1852fa0f3960d92118ab41a25%20400w,%20https://svd.vgc.no/v2/images/c3a843d6-8404-47ae-a3f5-2480efdd2709?fit=crop&h=169&q=80&upscale=true&w=300&s=dd7d57437c2c130f6dfa47b3b2c0eb7fdf859a32%20300w</a>
 E source/@srcset<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB">So, wrong ExtractorHTML code in the jar file or bug still not solved? (Or have I misunderstood?)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB">Side track: Are there not-too-hard ways to handle all existing warc files with srcset-pages where most resolutions are missing? Could Pywb use the resolutions available, instead of showing nothing?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB">(What happens now is that if the image for the current web browser window size is missing, no image is shown. If you make your browser window smaller, the image may suddenly show up. This is when there is just one image.
 Seems more complex when there are several.)<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB">Regards,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB"><o:p> </o:p></span></p>
<table class="MsoNormalTable" border="0" cellpadding="0">
<tbody>
<tr>
<td style="padding:0cm 0cm 0cm 0cm">
<p class="MsoNormal"><a href="https://www.kb.se/"><span style="font-size:9.0pt;font-family:"Arial",sans-serif;color:blue;mso-fareast-language:SV;text-decoration:none"><img border="0" width="113" height="170" style="width:1.177in;height:1.7708in" id="_x0000_i1025" src="https://signaturloggor.kb.se/png/Outlook%20logo%20m%d0%a4rkbl%d0%96.png" alt="KB Logo"></span></a><span style="font-size:9.0pt;font-family:"Arial",sans-serif;color:black;mso-fareast-language:SV"><o:p></o:p></span></p>
</td>
<td style="padding:0cm 0cm 0cm 5.25pt">
<p class="MsoNormal" style="mso-margin-top-alt:2.0pt;margin-right:0cm;margin-bottom:1.0pt;margin-left:0cm">
<b><span lang="EN-GB" style="font-size:9.0pt;font-family:"Arial",sans-serif;color:black;mso-fareast-language:SV">Peter Svanberg</span></b><span lang="EN-GB" style="font-size:9.0pt;font-family:"Arial",sans-serif;color:black;mso-fareast-language:SV"><o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:2.0pt;margin-right:0cm;margin-bottom:1.0pt;margin-left:0cm">
<b><span lang="EN-GB" style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-fareast-language:SV">Technical officer
</span></b><span lang="EN-GB" style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-fareast-language:SV"><o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:2.0pt;margin-right:0cm;margin-bottom:1.0pt;margin-left:0cm">
<span lang="EN-GB" style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-fareast-language:SV">Aquisitions and Metadata Department<br>
Film, Games, Sheet Music and Web Unit<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:9.0pt;font-family:"Arial",sans-serif;color:black;mso-fareast-language:SV"><o:p> </o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:2.0pt;margin-right:0cm;margin-bottom:1.0pt;margin-left:0cm">
<b><span lang="EN-GB" style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-fareast-language:SV">National Library of Sweden</span></b><span lang="EN-GB" style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-fareast-language:SV"><o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:2.0pt;margin-right:0cm;margin-bottom:1.0pt;margin-left:0cm">
<span lang="EN-GB" style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-fareast-language:SV">PO Box 5039, SE-102 41 Stockholm<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:2.0pt;margin-right:0cm;margin-bottom:1.0pt;margin-left:0cm">
<span style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-fareast-language:SV">Visits: Karlavägen 96, Stockholm<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:2.0pt;margin-right:0cm;margin-bottom:1.0pt;margin-left:0cm">
<span style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-fareast-language:SV">+46 10-709 32 78<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-top-alt:2.0pt;margin-right:0cm;margin-bottom:1.0pt;margin-left:0cm">
<span style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-fareast-language:SV"><a href="mailto:Peter.Svanberg@kb.se">Peter.Svanberg@kb.se</a><o:p></o:p></span></p>
<p class="MsoNormal"><a href="https://www.kb.se/"><span style="font-size:8.0pt;font-family:"Arial",sans-serif;color:blue;mso-fareast-language:SV">www.kb.se</span></a><span style="font-size:8.0pt;font-family:"Arial",sans-serif;color:black;mso-fareast-language:SV"><o:p></o:p></span></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal"><span lang="SV" style="mso-fareast-language:SV"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="SV"><o:p> </o:p></span></p>
</div>
</body>
</html>