Standards
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
User Name:
Password:
Remember me
Go Back   Web Development Archives Mailing Lists Standards

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Display Modes
 
Unread Web Development Archives Sponsor:
  #1  
Old April 25th, 2007, 09:40 AM
Yves Lafon
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
i28 proposed replacement text

ref: #i28

In section 4.4 the current text is:
<<<
2.If a Transfer-Encoding header field (section 14.41) is present and
has any value other than "identity", then the transfer-length is
defined by use of the "chunked" transfer-coding (section 3.6),
unless the message is terminated by closing the connection.

Here is the porposed clarification text:
<<<
2: If a Transfer-Encoding header field (Section 14.41) is present, and
has any value other than "identity", then the "chunked"
transfer-coding is used, then the transfer-length is
defined by use of the "chunked" transfer-coding (Section 3.6)

Even if any value other than "identity" implies that "chunked" MUST be
here, it is less prone to interpretations.

The "unless the message is terminated by closing the connection" would
mean that we are using Transfer-Encoding on an HTTP/1.0, and close the
connection to signal the end, in that case, it is already covered by item
5 with the caveat right after "For compatibility with HTTP/1.0
applications"

--
Baroula que barouleras, au toujou t'entourneras.

~~Yves

Reply With Quote
  #2  
Old April 18th, 2008, 07:30 AM
Yves Lafon
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
i28 proposed replacement text

Thu, 3 Apr 2008, Mark Nottingham wrote:

Yves,
>

Do you mean:
>
>2: If a Transfer-Encoding header field (Section 14.41) is present, and
>has any value other than "identity", then the transfer-length is
>defined by use of the "chunked" transfer-coding (Section 3.6).
>

? (I think there was a cut-and-paste error, perhaps).

Yes, as there is already a MUST (use chunked transfer-coding) in part 02,
section 3.4

If so, +1.
>

comment -- this is good, but my reading of the "unless the message is
terminated" text is that it was intended to cover the case where the
connection is prematurely closed. It may be useful to have text added below
the numbered list along these lines;
>

"""
If a message has a defined length (e.g., using chunked encoding,
Content-Length, or multipart/byteranges), and the connection is prematurely
closed, then the transfer-length will be less than indicated, and the message
is incomplete.
"""

+1
We can also add that it's an error (even if it seems obvious)

This leaves the question open of whether we want to place any additional
requirements on incomplete messages (which should probably be considered
separately).

We can't say, from such an error, if it's used to signal an error, or if
it is an unplanned error, also there is the issue of partial cache such
responses to a GET, or what to do when it's a PST. It all goes in the
"error recovery" bucket
We have some error recovery (as in part 1, 7.2.4), something on the same
lines for this case should be ok.
Cheers,

--
Baroula que barouleras, au toujou t'entourneras.

~~Yves

Reply With Quote
  #3  
Old May 12th, 2008, 06:01 AM
Henrik Nordstrom
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
i28 proposed replacement text

¥n, 2008-05-12 at 05:42 -0400, Yves Lafon wrote:
3. If a Transfer-Encoding header field (Section 14.41) is present, and
has a value other than "identity", then the transfer-length
is defined by closing the connection.

And in that case, you have no guarantee that the message in complete or
not (unless you use a Transfer-Encoding that gives that property, if the
recipient knows about this encoding, which is not granted.

, except for the last part. Closing the connection as delimiting
method is only possible on responses and there any transfer encoding
other than chunked is negotiated using TE.

Both gzip & deflate (zlib) is self delimiting and checksummed by the
way, giving you this property. So the only transfer encoding available
today in HTTP/1.1 not having this property is the "identity"
non-encoding without content-length.

Regards
Henrik

Reply With Quote
  #4  
Old May 12th, 2008, 06:01 AM
Yves Lafon
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
i28 proposed replacement text

Mon, 12 May 2008, Henrik Nordstrom wrote:


>Here is the porposed clarification text:
><<<
>2: If a Transfer-Encoding header field (Section 14.41) is present, and
>has any value other than "identity", then the transfer-length is
>defined by use of the "chunked" transfer-coding (Section 3.6)


>

No, this is wrong. See the section on transfer encoding.
>

The following message header is perfectly legal and is what the "closing
the connection" part is about:
>

HTTP/1.1 200 K
Transfer-Encoding: gzip
Connection: close
>
>

What it should read is something like the following:
>

2. If a Transfer-Encoding header field (Section 14.41) is present, and
indicates that "chunked" was the last encoding applied to the
message-body then the transfer-length is defined by use of the
"chunked" transfer-coding (Section 3.6).
>

3. If a Transfer-Encoding header field (Section 14.41) is present, and
has a value other than "identity", then the transfer-length
is defined by closing the connection.

And in that case, you have no guarantee that the message in complete or
not (unless you use a Transfer-Encoding that gives that property, if the
recipient knows about this encoding, which is not granted.

Side Note: It's illegal to apply any transfer-encoding after chunked
encoding simply because it would be a complete waste. Technically the
message format do support inner levels of chunked encoding.


--
Baroula que barouleras, au toujou t'entourneras.

~~Yves

Reply With Quote
  #5  
Old May 12th, 2008, 10:50 AM
Jamie Lokier
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
i28 proposed replacement text

Henrik Nordstrom wrote:
Both gzip & deflate (zlib) is self delimiting and checksummed by the
way, giving you this property.

Are you sure? Concatenating gzip files (.gz) is allowed: when
decompressed it results in the concatenation of the decompressed
parts. Therefore gzip _files_ aren't self delimiting.

I don't know if gzip-as-referenced-by-HTTP allows that, but given that
gzip files can, it would be inadvisable for the network protocol to be
different and depend on that difference.

-- Jamie

Reply With Quote
  #6  
Old May 12th, 2008, 12:30 PM
Henrik Nordstrom
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
i28 proposed replacement text

¥n, 2008-05-12 at 16:10 +0100, Jamie Lokier wrote:

Are you sure? Concatenating gzip files (.gz) is allowed: when
decompressed it results in the concatenation of the decompressed
parts. Therefore gzip _files_ aren't self delimiting.

True. But each part/member is, and if the sender sends a single member
it can be sure that the recipient can tell if the message got truncated
even if there is no other forms of delimiting. It's the sender that
selects to send the message without any other form of delimiting.

And in this case the protocol do not really depend on the delimiting
being detected proper. The message delimiting is the same as identity
encoding without content-length, by closing the connection. But unlike
identity encoding the recipient can clearly tell if the message got
unexpectedly truncated, with the small exception of a sender sending
multiple gzip members in the same stream and the message getting
truncated exactly between two members.

Regards
Henrik

Reply With Quote
  #7  
Old May 12th, 2008, 12:30 PM
Brian Smith
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
i28 proposed replacement text

Henrik Nordstrom wrote:
And in this case the protocol do not really depend on the delimiting
being detected proper. The message delimiting is the same as identity
encoding without content-length, by closing the connection. But unlike
identity encoding the recipient can clearly tell if the message got
unexpectedly truncated, with the small exception of a sender sending
multiple gzip members in the same stream and the message getting
truncated exactly between two members.

Does this mean that it isn't possible to use deflate or gzip
Transfer-Encoding on a persistent connection unless chunked encoding is
applied (last)? And, doesn't the following imply that any
transfer-encoding (besides identity) must be chunked? That is, "deflate"
would not be a valid Transfer-Encoding, but "deflate, chunked" would me.

2.If a Transfer-Encoding header field (section 14.41) is present and
has any value other than "identity", then the transfer-length is
defined by use of the "chunked" transfer-coding (section 3.6),
unless the message is terminated by closing the connection.

Regards,
Brian

Reply With Quote
  #8  
Old May 12th, 2008, 12:30 PM
Jamie Lokier
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
i28 proposed replacement text

Brian Smith wrote:
Does this mean that it isn't possible to use deflate or gzip
Transfer-Encoding on a persistent connection unless chunked encoding is
applied (last)? And, doesn't the following imply that any
transfer-encoding (besides identity) must be chunked? That is, "deflate"
would not be a valid Transfer-Encoding, but "deflate, chunked" would me.

I believe that's correct: to use deflate on a persistent connection,
you must have "Transfer-Encoding: deflate, chunked".

However, it is by far the common practice to use "Content-Encoding:
deflate" (or gzip) instead. User agents which decompress those send
the appropriate Accept-Encoding. In principle this reduces the work
required by a proxy, but limits the ability of a proxy to compress
some connections when one endpoint doesn't have the capability. But
in reality, Transfer-Encoding and Content-Encoding are virtually
interchangable in this regard.

So that makes compression independent of transfer encoding.

But then there's this small problem of bugs in old servers and user
agents either setting or parsing Content-Length as the length _after_
compression, which you might want to avoid.

-- Jamie

Reply With Quote
  #9  
Old May 12th, 2008, 12:30 PM
Henrik Nordstrom
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
i28 proposed replacement text

¥n, 2008-05-12 at 09:54 -0700, Brian Smith wrote:

Does this mean that it isn't possible to use deflate or gzip
Transfer-Encoding on a persistent connection unless chunked encoding is
applied (last)?

Correct. If transfer encodings is applied to the message and chunked
encoding isn't used the connection MUST be closed after the message to
signal end-of-message.

And, doesn't the following imply that any
transfer-encoding (besides identity) must be chunked? That is, "deflate"
would not be a valid Transfer-Encoding, but "deflate, chunked" would me.

2.If a Transfer-Encoding header field (section 14.41) is present and
has any value other than "identity", then the transfer-length is
defined by use of the "chunked" transfer-coding (section 3.6),
unless the message is terminated by closing the connection.


No. See the last section "unless the message is terminated by closing
the connection". It's a bit vague however and some people have misread
it and that is what this issue is about.

See also the section on Transfer Encoding. The rules for message
delimiting when using transfer encoding is explained quite well there:

p1 3.4 Transfer Encodings

Whenever a transfer-coding is applied to a message-body, the set of
transfer-codings MUST include "chunked", unless the message is
terminated by closing the connection. When the "chunked" transfer-coding
is used, it MUST be the last transfer-coding applied to the
message-body. The "chunked" transfer-coding MUST NT be applied more
than once to a message-body. These rules allow the recipient to
determine the transfer-length of the message (Section 4.4).


Regards
Henrik

Reply With Quote
  #10  
Old May 12th, 2008, 12:30 PM
Henrik Nordstrom
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
i28 proposed replacement text

¥n, 2008-05-12 at 18:06 +0100, Jamie Lokier wrote:

However, it is by far the common practice to use "Content-Encoding:
deflate" (or gzip) instead. User agents which decompress those send
the appropriate Accept-Encoding. In principle this reduces the work
required by a proxy, but limits the ability of a proxy to compress
some connections when one endpoint doesn't have the capability. But
in reality, Transfer-Encoding and Content-Encoding are virtually
interchangable in this regard.

But they are not Content-Encoding creates a new entity, while
transfer-encoding is fully transparent.

So that makes compression independent of transfer encoding.

?
But then there's this small problem of bugs in old servers and user
agents either setting or parsing Content-Length as the length _after_
compression, which you might want to avoid.

Which is partly why specs clearly say that if Transfer-Encoding is used
then Content-Length MUST be ignored, with the small exception for the
now removed case of "Transfer-Encoding: identity".

Regards
Henrik

Reply With Quote
  #11  
Old May 12th, 2008, 01:10 PM
Jamie Lokier
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
i28 proposed replacement text

Henrik Nordstrom wrote:
, 2008-05-12 at 18:06 +0100, Jamie Lokier wrote:

However, it is by far the common practice to use "Content-Encoding:
deflate" (or gzip) instead. User agents which decompress those send
the appropriate Accept-Encoding. In principle this reduces the work
required by a proxy, but limits the ability of a proxy to compress
some connections when one endpoint doesn't have the capability. But
in reality, Transfer-Encoding and Content-Encoding are virtually
interchangable in this regard.

But they are not Content-Encoding creates a new entity, while
transfer-encoding is fully transparent.

Sure, they are conceptually quite different. But in practice it can
be a treated as a mere configuration issue.

The real work does not have to be different. A proxy can forward a
gzip transfer-encoded entity without decoding and reencoding it (same
work/buffering as content-encoding), and just as it can use a
different encoding on the incoming and outgoing links, it can also
(non-transparently) be configured to change the content-encoding. I
acknowledge issues with Etag, range requests, non-transparency etc.

Doing that is not a HTTP proxy per spec, but it done nonethless in
some configurations, and it is useful.

So that makes compression independent of transfer encoding.

?

In practice.

But then there's this small problem of bugs in old servers and user
agents either setting or parsing Content-Length as the length _after_
compression, which you might want to avoid.

Which is partly why specs clearly say that if Transfer-Encoding is used
then Content-Length MUST be ignored, with the small exception for the
now removed case of "Transfer-Encoding: identity".

I was meaning Content-Length in conjunction with Content-Encoding, not
Transfer-Encoding.

-- Jamie

Reply With Quote
  #12  
Old May 12th, 2008, 05:10 PM
Henrik Nordstrom
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
i28 proposed replacement text

¥n, 2008-05-12 at 18:52 +0100, Jamie Lokier wrote:

Doing that is not a HTTP proxy per spec, but it done nonethless in
some configurations, and it is useful.

And breaking the evolution of HTTP quite noticeably Try deploying a
for example a WebDAV client behind such transforming proxy, or a client
fetching ranges.

So that makes compression independent of transfer encoding.

?

In practice.

I disagree. There is a lot more to HTTP than plain browsing, and these
proxies bending the HTTP often do so without knowing HTTP or the bad
effects they cause, and the ones deploying it often considers HTTP
"browsing only, nothing critical if it gets a bit messed up as long as
browsing to the major sites works".

Which is partly why specs clearly say that if Transfer-Encoding is used
then Content-Length MUST be ignored, with the small exception for the
now removed case of "Transfer-Encoding: identity".

I was meaning Content-Length in conjunction with Content-Encoding, not
Transfer-Encoding.

And where is the confusion there?

Content-Length with Content-Encoding is the message length, nothing
else. Anyone getting this wrong is seriously flawed.

Content-Encoding is a property of the resource returned, not of how it's
transferred. Content-Encoding does NT change the message format, only
the resource transferred. To the protocol very similar to
Content-Language or Content-Type but on a different axis.

Regards
Henrik

Reply With Quote
  #13  
Old May 12th, 2008, 05:10 PM
Jamie Lokier
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
i28 proposed replacement text

Henrik Nordstrom wrote:
, 2008-05-12 at 18:52 +0100, Jamie Lokier wrote:

Doing that is not a HTTP proxy per spec, but it done nonethless in
some configurations, and it is useful.

And breaking the evolution of HTTP quite noticeably Try deploying a
for example a WebDAV client behind such transforming proxy, or a client
fetching ranges.

If a WebDAV client says "Accept-Encoding: gzip" it will probably get
similar issues even with no proxy.

Many generic HTTP servers act as transformative "pseudo-proxies" to
their backend content - consider Apache with mod_gzip for example.
Therefore, WebDAV clients for general purpose use should not say
"Accept-Encoding: gzip" unless they handle the consequences, which
typically means transparently decompressing what's received. They
don't have to, but user expectations won't be met when connecting to
some servers, and editing may fail.

Range requests: if the proxy is written properly it can work.

HTTP evolution: Proxies for general HTTP use, such as at ISPs and
gateways, should not be configured that way.

proxies for specific applications would enable transformations
like that (we hope). An example is Apache with mod_gzip+mod_proxy
acting as a reverse proxy in a server farm (I don't know if that
really works). That is why I call it a configuration issue.

So that makes compression independent of transfer encoding.

?

In practice.

I disagree. There is a lot more to HTTP than plain browsing, and these
proxies bending the HTTP often do so without knowing HTTP or the bad
effects they cause, and the ones deploying it often considers HTTP
"browsing only, nothing critical if it gets a bit messed up as long as
browsing to the major sites works".

This is more like "as long as using major browsers (site irrelevant)
works, or as long as using a client intended to generally work with
sites found on the net (because mod_gzip is popular enough that even
non-browser clients must work with it, or not use Accept-Encoding)."
It is indeed dirty, but not as specifically dirty as you make out.

It's also not common to do this in proxies, so don't worry about it.
What is common is automatic compression a la mod_gzip, in what is
technically not a HTTP proxy, but is still a generic relay between
HTTP client and HTTP services, and similar non-transparency issues do
apply there.

Besides, I bet a HTTP proxy which opportunistically applies
"Transfer-Encoding: gzip" encoding when permitted, and adds "TE: gzip"
to requests removing the encoding from forwarded responses, will cause
problems too - maybe even bigger ones - even though it's fully
compliant and transparent according to spec.

Which is partly why specs clearly say that if Transfer-Encoding is used
then Content-Length MUST be ignored, with the small exception for the
now removed case of "Transfer-Encoding: identity".

I was meaning Content-Length in conjunction with Content-Encoding, not
Transfer-Encoding.

And where is the confusion there?

Content-Length with Content-Encoding is the message length, nothing
else. Anyone getting this wrong is seriously flawed.
>

Content-Encoding is a property of the resource returned, not of how it's
transferred. Content-Encoding does NT change the message format, only
the resource transferred. To the protocol very similar to
Content-Language or Content-Type but on a different axis.

I know. But spec isn't everything.

The serious flaw is deployed. I'm not surprised - it's a predictable
mistake given how HTTP systems are architected. When writing code you
can't ignore the installed base of buggy agents if you want to
interoperate. But as I've implied, that particular bug is found (as
far as I know) only in old agents which are dwindling in presence, so
you might choose to ignore it now, depending on how much you care
about reaching those remaining.

-- Jamie

Reply With Quote
  #14  
Old May 12th, 2008, 06:30 PM
Henrik Nordstrom
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
i28 proposed replacement text

¥n, 2008-05-12 at 22:27 +0100, Jamie Lokier wrote:

HTTP evolution: Proxies for general HTTP use, such as at ISPs and
gateways, should not be configured that way.

And yet there is plenty doing this, especially in bandwidth scarse parts
of the world, by looping all their traffic via a gzip proxy in a well
connected co-location, and some major proxy vendors advertising it as a
feature

It's also not common to do this in proxies, so don't worry about it.
What is common is automatic compression a la mod_gzip, in what is
technically not a HTTP proxy, but is still a generic relay between
HTTP client and HTTP services, and similar non-transparency issues do
apply there.

I know, as can be seen in my discussions with the Apache team

Besides, I bet a HTTP proxy which opportunistically applies
"Transfer-Encoding: gzip" encoding when permitted, and adds "TE: gzip"
to requests removing the encoding from forwarded responses, will cause
problems too - maybe even bigger ones - even though it's fully
compliant and transparent according to spec.

Not sure on that. There isn't many implementing TE: gzip today.

I know. But spec isn't everything.

Not short-term no, but when working with a timespan of several years
it's important.

The serious flaw is deployed. I'm not surprised - it's a predictable
mistake given how HTTP systems are architected. When writing code you
can't ignore the installed base of buggy agents if you want to
interoperate. But as I've implied, that particular bug is found (as
far as I know) only in old agents which are dwindling in presence, so
you might choose to ignore it now, depending on how much you care
about reaching those remaining.

Personally I have very little respect for old broken user agents. Those
generally have major gaping security flaws as well and really SHULD get
upgraded.

I care more about broken servers.

Regards
Henrik

Reply With Quote
  #15  
Old May 13th, 2008, 10:31 AM
Jamie Lokier
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
i28 proposed replacement text

Henrik Nordstrom wrote:
HTTP evolution: Proxies for general HTTP use, such as at ISPs and
gateways, should not be configured that way.

And yet there is plenty doing this, especially in bandwidth scarse parts
of the world, by looping all their traffic via a gzip proxy in a well
connected co-location, and some major proxy vendors advertising it as a
feature

Interesting. I didn't realise those exist. It's quite
understandable: cheap fast web browsing is important, and
Transfer-Encoding is widely supported enough agents to be used.

HTTP uses on those networks are relatively unimportant and will
have to workaround those transformations. Fortunately it is quite
straightforward and the least of one's worries - there's far more
annoying HTTP-in-reality issues!

I rather like the idea of proxies which use delta compression between
each other. Infinitely more effective. But alas not zero-install for
the end users.

Besides, I bet a HTTP proxy which opportunistically applies
"Transfer-Encoding: gzip" encoding when permitted, and adds "TE: gzip"
to requests removing the encoding from forwarded responses, will cause
problems too - maybe even bigger ones - even though it's fully
compliant and transparent according to spec.

Not sure on that. There isn't many implementing TE: gzip today.

I know. But spec isn't everything.

Not short-term no, but when working with a timespan of several years
it's important.

I agree, but I think the unfolding reality suggest ways the spec
doesn't match the ways people want to actually use HTTP. We push
sound principles but we work with the ecosystem too. Same reason
we're writing applications in HTML + Javascript even though it's messy
and slow.

The serious flaw is deployed. I'm not surprised - it's a predictable
mistake given how HTTP systems are architected. When writing code you
can't ignore the installed base of buggy agents if you want to
interoperate. But as I've implied, that particular bug is found (as
far as I know) only in old agents which are dwindling in presence, so
you might choose to ignore it now, depending on how much you care
about reaching those remaining.

Personally I have very little respect for old broken user agents. Those
generally have major gaping security flaws as well and really SHULD get
upgraded.

I care more about broken servers.

>From Google, I have the impression the problem is more prevalant on

servers, if you're counting different programs (as opposed to number
of running instances). But that it's also rare, occurring around the
time people weren't really using compression and just starting to
dabble in it.

-- Jamie

Reply With Quote
  #16  
Old May 13th, 2008, 12:10 PM
Jamie Lokier
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
i28 proposed replacement text

Jamie Lokier wrote:
And yet there is plenty doing this, especially in bandwidth scarse parts
of the world, by looping all their traffic via a gzip proxy in a well
connected co-location, and some major proxy vendors advertising it as a
feature

Interesting. I didn't realise those exist. It's quite
understandable: cheap fast web browsing is important, and
Transfer-Encoding is widely supported enough agents to be used.
^^ "in"

I forgot "not" and "in".

Cheers,
-- Jamie

Reply With Quote
  #17  
Old May 13th, 2008, 12:50 PM
Jamie Lokier
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
i28 proposed replacement text

Adrien de Croy wrote:
Transfer-encoding and Content-Encoding are fundamentally different. It
helps if you look at it from the point of view of who does the encoding.

Agreed, but I think everyone who spoke in this thread knows the
difference. The discussion hasn't denied that. I can see why it
might look that way.

Transfer-Encoding is performed on the fly by something in the stream
(e.g. proxy or output conversion process).

No, that's an implementation detail outside the scope of HTTP. It
_can_ be implemented that way, but HTTP does not say anything about
that or require it.

In such cases it's often impossible (i.e. non-deterministic length
of output of encoding) to know the length of the whole transformed
entity.

It's often impossible, but it's often possible. Agents can internally
cache gzip transfer-encoded bodies and HTTP permits that (it says
nothing about it).

Content-Encoding is different because the sender should know the length,
therefore can set Content-Length headers.

No, that's another implementation detail outside the scope of HTTP.
HTTP does not require the sender to know the length; it places no such
requirement, not even in principle.

In practice, many senders using Content-Encoding: gzip don't know the
length when they start sending, and use chunked encoding or close the
connection, which is allowed.

It is deemed a separate entity - an attribute of which is an
encoding, but as far as HTTP is concerned it may as well not be
encoded. The encoding is meant for the end consumer of the message.

That's all correct, but it has no bearing on how servers, clients