Skip to content

Return a content-encoding header for resource timing and more #1796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

guohuideng2024
Copy link

@guohuideng2024 guohuideng2024 commented Dec 13, 2024

The major change is to add content-encoding to response header list. This PR also adds description on how content-encoding is determined. (content negotiation)

The purpose is to pass such value to resource timing. Further details are available at
w3c/resource-timing#381.

Note: Per discussion at 12/05/2024 webPerWG call (https://docs.google.com/document/d/1mpFDrAWuV6IgvJ1KiL9sgIlcboC5uArtF8r_oqS1Sco/edit?tab=t.0#heading=h.af6v74wysf4m), we decided to allow arbitrary "content-encoding" value at "fetch". We only filter such value at client side, before passing the value to resource timing.

Related PR to modify resource timing specification:
w3c/resource-timing#411

(See WHATWG Working Mode: Changes for more details.)

Bug: w3c/resource-timing#381


Preview | Diff

@annevk
Copy link
Member

annevk commented Jan 7, 2025

Thanks for taking the time to pick this up. However, it doesn't seem like this addresses all the issues with #1742? I recommend studying the feedback on that PR.

@guohuideng2024
Copy link
Author

Thanks for taking the time to pick this up. However, it doesn't seem like this addresses all the issues with #1742? I recommend studying the feedback on that PR.

Hi Anne! I think I should have put up some background information here.

  1. You mentioned in Pass in Content-Encoding to resource-timing #1742 that the spec must define how the value is determined. This PR is trying to do that. The value is a result of the "content negotiation" (determine what encoding should be used) so I tried to add that into the existing text.
    Note that Pass in Content-Encoding to resource-timing #1742 is a change similar to one for a previously added field contentType. But contentEncoding is very different, it's not an extracted MIME type, but a result of "content negotiation". So, this PR should be very different from Pass in Content-Encoding to resource-timing #1742

  2. We originally thought that the filtering should happen at the "fetch" stage. But in the last web perf meeting Patrick brought up that the returned contentEncoding can be a proprietary value and that value is needed by service worker. So, the unfiltered value must be kept by the browser and the filtering should happen right before reported to resourceTiming.

https://docs.google.com/document/d/1mpFDrAWuV6IgvJ1KiL9sgIlcboC5uArtF8r_oqS1Sco/edit?tab=t.0#heading=h.af6v74wysf4m

Therefore, in this fetch doc I didn't mention filtering. I mentioned "filtering" in the resourceTiming spec:
w3c/resource-timing#411
And I am going to add more details about the filtering there.

Does this sound right to you? I am new to fetch and I may have missed a lot of things here. Thanks for your patience and guidance.
Guohui

chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jan 16, 2025
This CL introduce a contentEncoding field to Performance resource timing
object. This field is behind a feature flag.

PR to resource timing specification:
w3c/resource-timing#411
PR to fetch specification:
whatwg/fetch#1796

Bug: 327941462
Change-Id: I70cad190fe658fb3dbf8b401ff8393bc1d0782f0
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jan 16, 2025
This CL introduce a contentEncoding field to Performance resource timing
object. This field is behind a feature flag.

PR to resource timing specification:
w3c/resource-timing#411
PR to fetch specification:
whatwg/fetch#1796

Bug: 327941462
Change-Id: I70cad190fe658fb3dbf8b401ff8393bc1d0782f0
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6098321
Commit-Queue: Guohui Deng <[email protected]>
Reviewed-by: Noam Rosenthal <[email protected]>
Reviewed-by: Matthew Denton <[email protected]>
Reviewed-by: Yoav Weiss (@Shopify) <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1407331}
aarongable pushed a commit to chromium/chromium that referenced this pull request Jan 16, 2025
This CL introduce a contentEncoding field to Performance resource timing
object. This field is behind a feature flag.

PR to resource timing specification:
w3c/resource-timing#411
PR to fetch specification:
whatwg/fetch#1796

Bug: 327941462
Change-Id: I70cad190fe658fb3dbf8b401ff8393bc1d0782f0
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6098321
Commit-Queue: Guohui Deng <[email protected]>
Reviewed-by: Noam Rosenthal <[email protected]>
Reviewed-by: Matthew Denton <[email protected]>
Reviewed-by: Yoav Weiss (@Shopify) <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1407331}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this pull request Jan 16, 2025
This CL introduce a contentEncoding field to Performance resource timing
object. This field is behind a feature flag.

PR to resource timing specification:
w3c/resource-timing#411
PR to fetch specification:
whatwg/fetch#1796

Bug: 327941462
Change-Id: I70cad190fe658fb3dbf8b401ff8393bc1d0782f0
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6098321
Commit-Queue: Guohui Deng <[email protected]>
Reviewed-by: Noam Rosenthal <[email protected]>
Reviewed-by: Matthew Denton <[email protected]>
Reviewed-by: Yoav Weiss (@Shopify) <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1407331}
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Jan 21, 2025
…ourceTiming, a=testonly

Automatic update from web-platform-tests
Expose contentEncoding in PerformanceResourceTiming

This CL introduce a contentEncoding field to Performance resource timing
object. This field is behind a feature flag.

PR to resource timing specification:
w3c/resource-timing#411
PR to fetch specification:
whatwg/fetch#1796

Bug: 327941462
Change-Id: I70cad190fe658fb3dbf8b401ff8393bc1d0782f0
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6098321
Commit-Queue: Guohui Deng <[email protected]>
Reviewed-by: Noam Rosenthal <[email protected]>
Reviewed-by: Matthew Denton <[email protected]>
Reviewed-by: Yoav Weiss (@Shopify) <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1407331}

--

wpt-commits: 1df2c3e47bcb6379ecf3a07735bd967101d02a5b
wpt-pr: 50115
guohuideng2024 and others added 2 commits January 21, 2025 15:54
1) formatting;
2) "gzip, GZIP" is ok for they case-insensitive match.
3) there is a mistake saying that the "contentEncoding" consists of
digits;
4) no longer returns "contentEncoding" for data url.
That's on the client side getting the reponse header.
Just add the content encoding to body info.
@guohuideng2024
Copy link
Author

Updated the patch, I just added the content encoding to the body info struct, and add the clause that updates it.

@guohuideng2024
Copy link
Author

very sorry for so many mistakes folks. Thanks for you guys' patence.

@noamr
Copy link
Contributor

noamr commented Jan 23, 2025

very sorry for so many mistakes folks. Thanks for you guys' patence.

No worries, we've all been there! (Or at least I have...)

i3roly pushed a commit to i3roly/firefox-dynasty that referenced this pull request Jan 24, 2025
…ourceTiming, a=testonly

Automatic update from web-platform-tests
Expose contentEncoding in PerformanceResourceTiming

This CL introduce a contentEncoding field to Performance resource timing
object. This field is behind a feature flag.

PR to resource timing specification:
w3c/resource-timing#411
PR to fetch specification:
whatwg/fetch#1796

Bug: 327941462
Change-Id: I70cad190fe658fb3dbf8b401ff8393bc1d0782f0
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6098321
Commit-Queue: Guohui Deng <[email protected]>
Reviewed-by: Noam Rosenthal <[email protected]>
Reviewed-by: Matthew Denton <[email protected]>
Reviewed-by: Yoav Weiss (@Shopify) <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1407331}

--

wpt-commits: 1df2c3e47bcb6379ecf3a07735bd967101d02a5b
wpt-pr: 50115
@annevk
Copy link
Member

annevk commented Feb 27, 2025

I think it should be specified here as that matches how we do MIME types and that reduces the chances of someone inadvertently exposing the information. In other words: the guarantee should come from Fetch, not from the caller.

@guohuideng2024
Copy link
Author

I think it should be specified here as that matches how we do MIME types and that reduces the chances of someone inadvertently exposing the information. In other words: the guarantee should come from Fetch, not from the caller.

The "raw" contentEncoding value can be arbitrary proprietary compression the app uses, and it's leaked as a response header.
So it's indeed a new communication channel that's created :(.

Meanwhile I think moving the filtering here guarantees that the only place where the raw contentEncoding is leaked is the fetch response header. I would say something here that contentEncoding is filtered when accessed anywhere else.

If there is any concern pls let me know. Thanks.

@annevk
Copy link
Member

annevk commented Mar 6, 2025

To be clear, the header is not exposed to the website passively embedding the resource, but this getter is. I don't think I understand your suggestion, could you rephrase?

@noamr
Copy link
Contributor

noamr commented Mar 7, 2025

I think it should be specified here as that matches how we do MIME types and that reduces the chances of someone inadvertently exposing the information. In other words: the guarantee should come from Fetch, not from the caller.

The "raw" contentEncoding value can be arbitrary proprietary compression the app uses, and it's leaked as a response header.
So it's indeed a new communication channel that's created :(.

Meanwhile I think moving the filtering here guarantees that the only place where the raw contentEncoding is leaked is the fetch response header. I would say something here that contentEncoding is filtered when accessed anywhere else.

If there is any concern pls let me know. Thanks.

Specifically, it needs to be explicitly filtered when assigned to the response body into struct.

@guohuideng2024 guohuideng2024 marked this pull request as draft March 7, 2025 23:34
@guohuideng2024 guohuideng2024 marked this pull request as ready for review March 8, 2025 23:14
@guohuideng2024
Copy link
Author

I think it should be specified here as that matches how we do MIME types and that reduces the chances of someone inadvertently exposing the information. In other words: the guarantee should come from Fetch, not from the caller.

The "raw" contentEncoding value can be arbitrary proprietary compression the app uses, and it's leaked as a response header.
So it's indeed a new communication channel that's created :(.
Meanwhile I think moving the filtering here guarantees that the only place where the raw contentEncoding is leaked is the fetch response header. I would say something here that contentEncoding is filtered when accessed anywhere else.
If there is any concern pls let me know. Thanks.

Specifically, it needs to be explicitly filtered when assigned to the response body into struct.

Got it, Thanks! I updated the PR accordingly.

@guohuideng2024
Copy link
Author

To be clear, the header is not exposed to the website passively embedding the resource, but this getter is. I don't think I understand your suggestion, could you rephrase?

I think the website can get the arbitrary value like this: (I am new to this area so please correct me if I am wrong)

let myCoding = myHeaders.get("Content-Encoding");  //  |myCoding| can be a proprietary compression, i.e., an arbitrary value.

And the reason for that is some use cases involving service workers. See
w3c/resource-timing#381

but the contentEncoding field in resourceTiming is filtered, where only a few pre-determined values are permitted.

@noamr
Copy link
Contributor

noamr commented Mar 11, 2025

To be clear, the header is not exposed to the website passively embedding the resource, but this getter is. I don't think I understand your suggestion, could you rephrase?

I think the website can get the arbitrary value like this: (I am new to this area so please correct me if I am wrong)

let myCoding = myHeaders.get("Content-Encoding");  //  |myCoding| can be a proprietary compression, i.e., an arbitrary value.

You would only get access to myHeaders if this is an actual fetch or via a service worker; Those channels are not always available.

And the reason for that is some use cases involving service workers. See w3c/resource-timing#381

but the contentEncoding field in resourceTiming is filtered, where only a few pre-determined values are permitted.

Yea, so filtering them when assigning to the struct wouldn't change anything observable, but any future user of that struct would get the filtered value.

@guohuideng2024
Copy link
Author

Thank you Noam!

@annevk : Would you take one more look? (I also left a response at WebKit/standards-positions#467 )

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed you're participating on behalf of Microsoft. That means you cannot sign the contributor's agreement as an individual. Microsoft has already signed up for the Fetch Workstream so you have to join the relevant GitHub organization (MicrosoftWHATWGContributors) and make your membership thereof public.

@@ -6319,6 +6321,24 @@ optional boolean <var>forceNewConnection</var> (default false), run these steps:
<li><p>Let <var>codings</var> be the result of <a>extracting header list values</a> given
`<code>Content-Encoding</code>` and <var>response</var>'s <a for=response>header list</a>.

<li><p>Let <var>filteredCoding</var> be "<code>unknown</code>".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a distinct value from the empty string? It also seems to squat on the value space of the registry, which doesn't seem ideal?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unknown means there is a compression that's not recognized by the browser
empty string means there is no compression.
We would like to distinguish the two. In this discussion thread: w3c/resource-timing#381, nhelfman points out that the two situations need to be distinguishable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, what about the value space concern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, at the moment "unknown" is also used when the header could not be parsed. Maybe that's okay. Is that tested?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Could you help me understand the "value space concern" problem?
  2. Yes, I made two test cases where the contentEncoding value is filtered to unknown value.
    https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/web_tests/external/wpt/resource-timing/content-encoding.https.html;l=36

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in principle someone could register unknown as a content coding and user agents could implement it, but this API would not be able to distinguish it from the unknown case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see! I think a solution is to make unknown a reserved word and it cannot be used as contentEncoding in the response header. There is already one reserved word identity.

Do you think I should try to make that happen? I couldn't find how to propose changes to that iana "http parameters". There is a "contact" section on that page but the emails are obviously out of date. (they are @sun.com :) )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you email Mark Nottingham [email protected] and copy [email protected] (that's me) with the information stated at https://httpwg.org/specs/rfc9110.html#content.coding.extensibility?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think I should try to make that happen? I couldn't find how to propose changes to that iana "http parameters". There is a "contact" section on that page but the emails are obviously out of date. (they are @sun.com :) )

These are contacts for existing registrations; not the contact point for new registrations. Blame IANA for the confusion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sent email to Mark (and copied Anne).

fetch.bs Outdated
Comment on lines 6328 to 6329
<li><p>Otherwise, if <var>codings</var> contains two strings or more, set <var>filteredCoding</var> to
"<code>multiple</code>".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then set*

And this should probably compare with Infra's size concept.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added "then". But I am not sure how to "compare with infra's size concept". I looked at the "infra" section but I didn't see a title related to "size". Would you please help me understand what else needs to be done with this paragraph? Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! I made changes accordingly.

fetch.bs Outdated

<li><p>Otherwise, if <var>codings[0]</var> is the empty string, or it is supported by the user agent,
and is listed in the <a href="https://www.iana.org/assignments/http-parameters/http-parameters.xhtml#content-coding">
content encoding registry on IANA</a>, set <var>filteredCoding</var> to <var>codings[0]</var>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want <cite>HTTP Content Coding Registry</cite> inline and then use a reference for the actual URL.

(And also apply the earlier comments.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite done. The casing is incorrect and you didn't move the URL into the references section.

Copy link
Author

@guohuideng2024 guohuideng2024 Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see another example of reference, so I did the same: with an inline link reference in place, I also made an entry in the "reference section".

The reference is not in Specref yet, so I submitted this PR:
tobie/specref#860

I cannot verify this yet because the specref PR is not merged.
Does this look correct to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to be in specref, you can also modify <pre class=biblio> in this document.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Do you think I should withdraw the PR to the specref? Thanks.

@annevk
Copy link
Member

annevk commented Apr 9, 2025

Now with the filtering in place it's probably slightly more reasonable to leave the existing parsing issue unsolved for now, assuming there's adequate test coverage.

@guohuideng2024
Copy link
Author

Thanks @annevk! I am working on MicrosoftWHATWGContributors membership right now.

@@ -6319,6 +6321,24 @@ optional boolean <var>forceNewConnection</var> (default false), run these steps:
<li><p>Let <var>codings</var> be the result of <a>extracting header list values</a> given
`<code>Content-Encoding</code>` and <var>response</var>'s <a for=response>header list</a>.

<li><p>Let <var>filteredCoding</var> be "<code>unknown</code>".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, what about the value space concern?

fetch.bs Outdated

<li><p>Otherwise, if <var>codings</var>[0] is the empty string, or it is supported by the user agent,
and is listed in the <a href="https://www.iana.org/assignments/http-parameters/http-parameters.xhtml#content-coding">
<cite>content encoding registry on IANA</cite></a>, then set <var>filteredCoding</var> to <var>codings</var>[0].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the server specifies "GZIP"? Does it get lowercased?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes all output should be lowercased and I made changes in text accordingly. Thanks for pointing out.
  2. In the current text, a value is "allowed" as long as it "case insensitive matches" a registered value. And it will be lowercased before exposed. Is it O.K.?

Thanks.

fetch.bs Outdated

<li><p>If <var>codings</var> is null, then set <var>filteredCoding</var> to the empty string.

<li><p>Otherwise, if <var>codings</var>'s <a for=list>size</a> is 2 or more, then set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<li><p>Otherwise, if <var>codings</var>'s <a for=list>size</a> is 2 or more, then set
<li><p>Otherwise, if <var>codings</var>'s <a for=list>size</a> is greater than 1, then set

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

fetch.bs Outdated

<li><p>Set <var>response</var>'s <a for=response>body info</a>'s
<a for="response body info">content encoding</a> to the result of
<a lt=byte-lowercased>byte-lowercasing</a> <var>filteredCoding</var>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<a lt=byte-lowercased>byte-lowercasing</a> <var>filteredCoding</var>.
<a lt=byte-lowercased>byte-lowercasing</a> <var>filteredCoding</var>.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. This sentence is modified, and the superfluous space is gone

fetch.bs Outdated
Comment on lines 6331 to 6335
<li><p>Otherwise, if <var>codings</var>[0] is the empty string, or it is supported by the user agent,
and is a <a>byte-case-insensitive</a> match for an entry listed in the
<a href="https://www.iana.org/assignments/http-parameters/http-parameters.xhtml#content-coding">
<cite>HTTP Content Coding Registry</cite></a> of [[!IANA-HTTP-PARAMS]], then set
<var>filteredCoding</var> to <var>codings</var>[0].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer if you did the lowercasing here. There's no reason to lowercase the other branches.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

@whatwg whatwg deleted a comment from guohuideng2024 Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants