|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
i28 proposed replacement text
ref: #i28
In section 4.4 the current text is: <<< 2.If a Transfer-Encoding header field (section 14.41) is present and has any value other than "identity", then the transfer-length is defined by use of the "chunked" transfer-coding (section 3.6), unless the message is terminated by closing the connection. Here is the porposed clarification text: <<< 2: If a Transfer-Encoding header field (Section 14.41) is present, and has any value other than "identity", then the "chunked" transfer-coding is used, then the transfer-length is defined by use of the "chunked" transfer-coding (Section 3.6) Even if any value other than "identity" implies that "chunked" MUST be here, it is less prone to interpretations. The "unless the message is terminated by closing the connection" would mean that we are using Transfer-Encoding on an HTTP/1.0, and close the connection to signal the end, in that case, it is already covered by item 5 with the caveat right after "For compatibility with HTTP/1.0 applications" -- Baroula que barouleras, au toujou t'entourneras. ~~Yves |
|
#2
|
|||
|
|||
|
i28 proposed replacement text
Thu, 3 Apr 2008, Mark Nottingham wrote:
Yves, > Do you mean: > >2: If a Transfer-Encoding header field (Section 14.41) is present, and >has any value other than "identity", then the transfer-length is >defined by use of the "chunked" transfer-coding (Section 3.6). > ? (I think there was a cut-and-paste error, perhaps). Yes, as there is already a MUST (use chunked transfer-coding) in part 02, section 3.4 If so, +1. > comment -- this is good, but my reading of the "unless the message is terminated" text is that it was intended to cover the case where the connection is prematurely closed. It may be useful to have text added below the numbered list along these lines; > """ If a message has a defined length (e.g., using chunked encoding, Content-Length, or multipart/byteranges), and the connection is prematurely closed, then the transfer-length will be less than indicated, and the message is incomplete. """ +1 We can also add that it's an error (even if it seems obvious) This leaves the question open of whether we want to place any additional requirements on incomplete messages (which should probably be considered separately). We can't say, from such an error, if it's used to signal an error, or if it is an unplanned error, also there is the issue of partial cache such responses to a GET, or what to do when it's a PST. It all goes in the "error recovery" bucket We have some error recovery (as in part 1, 7.2.4), something on the same lines for this case should be ok. Cheers, -- Baroula que barouleras, au toujou t'entourneras. ~~Yves |
|
#3
|
|||
|
|||
|
i28 proposed replacement text
¥n, 2008-05-12 at 05:42 -0400, Yves Lafon wrote:
3. If a Transfer-Encoding header field (Section 14.41) is present, and has a value other than "identity", then the transfer-length is defined by closing the connection. And in that case, you have no guarantee that the message in complete or not (unless you use a Transfer-Encoding that gives that property, if the recipient knows about this encoding, which is not granted. , except for the last part. Closing the connection as delimiting method is only possible on responses and there any transfer encoding other than chunked is negotiated using TE. Both gzip & deflate (zlib) is self delimiting and checksummed by the way, giving you this property. So the only transfer encoding available today in HTTP/1.1 not having this property is the "identity" non-encoding without content-length. Regards Henrik |
|
#4
|
|||
|
|||
|
i28 proposed replacement text
Mon, 12 May 2008, Henrik Nordstrom wrote:
>Here is the porposed clarification text: ><<< >2: If a Transfer-Encoding header field (Section 14.41) is present, and >has any value other than "identity", then the transfer-length is >defined by use of the "chunked" transfer-coding (Section 3.6) > No, this is wrong. See the section on transfer encoding. > The following message header is perfectly legal and is what the "closing the connection" part is about: > HTTP/1.1 200 K Transfer-Encoding: gzip Connection: close > > What it should read is something like the following: > 2. If a Transfer-Encoding header field (Section 14.41) is present, and indicates that "chunked" was the last encoding applied to the message-body then the transfer-length is defined by use of the "chunked" transfer-coding (Section 3.6). > 3. If a Transfer-Encoding header field (Section 14.41) is present, and has a value other than "identity", then the transfer-length is defined by closing the connection. And in that case, you have no guarantee that the message in complete or not (unless you use a Transfer-Encoding that gives that property, if the recipient knows about this encoding, which is not granted. Side Note: It's illegal to apply any transfer-encoding after chunked encoding simply because it would be a complete waste. Technically the message format do support inner levels of chunked encoding. -- Baroula que barouleras, au toujou t'entourneras. ~~Yves |
|
#5
|
|||
|
|||
|
i28 proposed replacement text
Henrik Nordstrom wrote:
Both gzip & deflate (zlib) is self delimiting and checksummed by the way, giving you this property. Are you sure? Concatenating gzip files (.gz) is allowed: when decompressed it results in the concatenation of the decompressed parts. Therefore gzip _files_ aren't self delimiting. I don't know if gzip-as-referenced-by-HTTP allows that, but given that gzip files can, it would be inadvisable for the network protocol to be different and depend on that difference. -- Jamie |
|
#6
|
|||
|
|||
|
i28 proposed replacement text
¥n, 2008-05-12 at 16:10 +0100, Jamie Lokier wrote:
Are you sure? Concatenating gzip files (.gz) is allowed: when decompressed it results in the concatenation of the decompressed parts. Therefore gzip _files_ aren't self delimiting. True. But each part/member is, and if the sender sends a single member it can be sure that the recipient can tell if the message got truncated even if there is no other forms of delimiting. It's the sender that selects to send the message without any other form of delimiting. And in this case the protocol do not really depend on the delimiting being detected proper. The message delimiting is the same as identity encoding without content-length, by closing the connection. But unlike identity encoding the recipient can clearly tell if the message got unexpectedly truncated, with the small exception of a sender sending multiple gzip members in the same stream and the message getting truncated exactly between two members. Regards Henrik |
|
#7
|
|||
|
|||
|
i28 proposed replacement text
Henrik Nordstrom wrote:
And in this case the protocol do not really depend on the delimiting being detected proper. The message delimiting is the same as identity encoding without content-length, by closing the connection. But unlike identity encoding the recipient can clearly tell if the message got unexpectedly truncated, with the small exception of a sender sending multiple gzip members in the same stream and the message getting truncated exactly between two members. Does this mean that it isn't possible to use deflate or gzip Transfer-Encoding on a persistent connection unless chunked encoding is applied (last)? And, doesn't the following imply that any transfer-encoding (besides identity) must be chunked? That is, "deflate" would not be a valid Transfer-Encoding, but "deflate, chunked" would me. 2.If a Transfer-Encoding header field (section 14.41) is present and has any value other than "identity", then the transfer-length is defined by use of the "chunked" transfer-coding (section 3.6), unless the message is terminated by closing the connection. Regards, Brian |
|
#8
|
|||
|
|||
|
i28 proposed replacement text
Brian Smith wrote:
Does this mean that it isn't possible to use deflate or gzip Transfer-Encoding on a persistent connection unless chunked encoding is applied (last)? And, doesn't the following imply that any transfer-encoding (besides identity) must be chunked? That is, "deflate" would not be a valid Transfer-Encoding, but "deflate, chunked" would me. I believe that's correct: to use deflate on a persistent connection, you must have "Transfer-Encoding: deflate, chunked". However, it is by far the common practice to use "Content-Encoding: deflate" (or gzip) instead. User agents which decompress those send the appropriate Accept-Encoding. In principle this reduces the work required by a proxy, but limits the ability of a proxy to compress some connections when one endpoint doesn't have the capability. But in reality, Transfer-Encoding and Content-Encoding are virtually interchangable in this regard. So that makes compression independent of transfer encoding. But then there's this small problem of bugs in old servers and user agents either setting or parsing Content-Length as the length _after_ compression, which you might want to avoid. -- Jamie |
|
#9
|
|||
|
|||
|
i28 proposed replacement text
¥n, 2008-05-12 at 09:54 -0700, Brian Smith wrote:
Does this mean that it isn't possible to use deflate or gzip Transfer-Encoding on a persistent connection unless chunked encoding is applied (last)? Correct. If transfer encodings is applied to the message and chunked encoding isn't used the connection MUST be closed after the message to signal end-of-message. And, doesn't the following imply that any transfer-encoding (besides identity) must be chunked? That is, "deflate" would not be a valid Transfer-Encoding, but "deflate, chunked" would me. 2.If a Transfer-Encoding header field (section 14.41) is present and has any value other than "identity", then the transfer-length is defined by use of the "chunked" transfer-coding (section 3.6), unless the message is terminated by closing the connection. No. See the last section "unless the message is terminated by closing the connection". It's a bit vague however and some people have misread it and that is what this issue is about. See also the section on Transfer Encoding. The rules for message delimiting when using transfer encoding is explained quite well there: p1 3.4 Transfer Encodings Whenever a transfer-coding is applied to a message-body, the set of transfer-codings MUST include "chunked", unless the message is terminated by closing the connection. When the "chunked" transfer-coding is used, it MUST be the last transfer-coding applied to the message-body. The "chunked" transfer-coding MUST NT be applied more than once to a message-body. These rules allow the recipient to determine the transfer-length of the message (Section 4.4). Regards Henrik |
|
#10
|
|||
|
|||
|
i28 proposed replacement text
¥n, 2008-05-12 at 18:06 +0100, Jamie Lokier wrote:
However, it is by far the common practice to use "Content-Encoding: deflate" (or gzip) instead. User agents which decompress those send the appropriate Accept-Encoding. In principle this reduces the work required by a proxy, but limits the ability of a proxy to compress some connections when one endpoint doesn't have the capability. But in reality, Transfer-Encoding and Content-Encoding are virtually interchangable in this regard. But they are not Content-Encoding creates a new entity, while transfer-encoding is fully transparent. So that makes compression independent of transfer encoding. ? But then there's this small problem of bugs in old servers and user agents either setting or parsing Content-Length as the length _after_ compression, which you might want to avoid. Which is partly why specs clearly say that if Transfer-Encoding is used then Content-Length MUST be ignored, with the small exception for the now removed case of "Transfer-Encoding: identity". Regards Henrik |
|
#11
|
|||
|
|||
|
i28 proposed replacement text
Henrik Nordstrom wrote:
, 2008-05-12 at 18:06 +0100, Jamie Lokier wrote: However, it is by far the common practice to use "Content-Encoding: deflate" (or gzip) instead. User agents which decompress those send the appropriate Accept-Encoding. In principle this reduces the work required by a proxy, but limits the ability of a proxy to compress some connections when one endpoint doesn't have the capability. But in reality, Transfer-Encoding and Content-Encoding are virtually interchangable in this regard. But they are not Content-Encoding creates a new entity, while transfer-encoding is fully transparent. Sure, they are conceptually quite different. But in practice it can be a treated as a mere configuration issue. The real work does not have to be different. A proxy can forward a gzip transfer-encoded entity without decoding and reencoding it (same work/buffering as content-encoding), and just as it can use a different encoding on the incoming and outgoing links, it can also (non-transparently) be configured to change the content-encoding. I acknowledge issues with Etag, range requests, non-transparency etc. Doing that is not a HTTP proxy per spec, but it done nonethless in some configurations, and it is useful. So that makes compression independent of transfer encoding. ? In practice. But then there's this small problem of bugs in old servers and user agents either setting or parsing Content-Length as the length _after_ compression, which you might want to avoid. Which is partly why specs clearly say that if Transfer-Encoding is used then Content-Length MUST be ignored, with the small exception for the now removed case of "Transfer-Encoding: identity". I was meaning Content-Length in conjunction with Content-Encoding, not Transfer-Encoding. -- Jamie |
|
#12
|
|||
|
|||
|
i28 proposed replacement text
¥n, 2008-05-12 at 18:52 +0100, Jamie Lokier wrote:
Doing that is not a HTTP proxy per spec, but it done nonethless in some configurations, and it is useful. And breaking the evolution of HTTP quite noticeably Try deploying a for example a WebDAV client behind such transforming proxy, or a client fetching ranges. So that makes compression independent of transfer encoding. ? In practice. I disagree. There is a lot more to HTTP than plain browsing, and these proxies bending the HTTP often do so without knowing HTTP or the bad effects they cause, and the ones deploying it often considers HTTP "browsing only, nothing critical if it gets a bit messed up as long as browsing to the major sites works". Which is partly why specs clearly say that if Transfer-Encoding is used then Content-Length MUST be ignored, with the small exception for the now removed case of "Transfer-Encoding: identity". I was meaning Content-Length in conjunction with Content-Encoding, not Transfer-Encoding. And where is the confusion there? Content-Length with Content-Encoding is the message length, nothing else. Anyone getting this wrong is seriously flawed. Content-Encoding is a property of the resource returned, not of how it's transferred. Content-Encoding does NT change the message format, only the resource transferred. To the protocol very similar to Content-Language or Content-Type but on a different axis. Regards Henrik |
|
#13
|
|||
|
|||
|
i28 proposed replacement text
Henrik Nordstrom wrote:
, 2008-05-12 at 18:52 +0100, Jamie Lokier wrote: Doing that is not a HTTP proxy per spec, but it done nonethless in some configurations, and it is useful. And breaking the evolution of HTTP quite noticeably Try deploying a for example a WebDAV client behind such transforming proxy, or a client fetching ranges. If a WebDAV client says "Accept-Encoding: gzip" it will probably get similar issues even with no proxy. Many generic HTTP servers act as transformative "pseudo-proxies" to their backend content - consider Apache with mod_gzip for example. Therefore, WebDAV clients for general purpose use should not say "Accept-Encoding: gzip" unless they handle the consequences, which typically means transparently decompressing what's received. They don't have to, but user expectations won't be met when connecting to some servers, and editing may fail. Range requests: if the proxy is written properly it can work. HTTP evolution: Proxies for general HTTP use, such as at ISPs and gateways, should not be configured that way. proxies for specific applications would enable transformations like that (we hope). An example is Apache with mod_gzip+mod_proxy acting as a reverse proxy in a server farm (I don't know if that really works). That is why I call it a configuration issue. So that makes compression independent of transfer encoding. ? In practice. I disagree. There is a lot more to HTTP than plain browsing, and these proxies bending the HTTP often do so without knowing HTTP or the bad effects they cause, and the ones deploying it often considers HTTP "browsing only, nothing critical if it gets a bit messed up as long as browsing to the major sites works". This is more like "as long as using major browsers (site irrelevant) works, or as long as using a client intended to generally work with sites found on the net (because mod_gzip is popular enough that even non-browser clients must work with it, or not use Accept-Encoding)." It is indeed dirty, but not as specifically dirty as you make out. It's also not common to do this in proxies, so don't worry about it. What is common is automatic compression a la mod_gzip, in what is technically not a HTTP proxy, but is still a generic relay between HTTP client and HTTP services, and similar non-transparency issues do apply there. Besides, I bet a HTTP proxy which opportunistically applies "Transfer-Encoding: gzip" encoding when permitted, and adds "TE: gzip" to requests removing the encoding from forwarded responses, will cause problems too - maybe even bigger ones - even though it's fully compliant and transparent according to spec. Which is partly why specs clearly say that if Transfer-Encoding is used then Content-Length MUST be ignored, with the small exception for the now removed case of "Transfer-Encoding: identity". I was meaning Content-Length in conjunction with Content-Encoding, not Transfer-Encoding. And where is the confusion there? Content-Length with Content-Encoding is the message length, nothing else. Anyone getting this wrong is seriously flawed. > Content-Encoding is a property of the resource returned, not of how it's transferred. Content-Encoding does NT change the message format, only the resource transferred. To the protocol very similar to Content-Language or Content-Type but on a different axis. I know. But spec isn't everything. The serious flaw is deployed. I'm not surprised - it's a predictable mistake given how HTTP systems are architected. When writing code you can't ignore the installed base of buggy agents if you want to interoperate. But as I've implied, that particular bug is found (as far as I know) only in old agents which are dwindling in presence, so you might choose to ignore it now, depending on how much you care about reaching those remaining. -- Jamie |
|
#14
|
|||
|
|||
|
i28 proposed replacement text
¥n, 2008-05-12 at 22:27 +0100, Jamie Lokier wrote:
HTTP evolution: Proxies for general HTTP use, such as at ISPs and gateways, should not be configured that way. And yet there is plenty doing this, especially in bandwidth scarse parts of the world, by looping all their traffic via a gzip proxy in a well connected co-location, and some major proxy vendors advertising it as a feature It's also not common to do this in proxies, so don't worry about it. What is common is automatic compression a la mod_gzip, in what is technically not a HTTP proxy, but is still a generic relay between HTTP client and HTTP services, and similar non-transparency issues do apply there. I know, as can be seen in my discussions with the Apache team Besides, I bet a HTTP proxy which opportunistically applies "Transfer-Encoding: gzip" encoding when permitted, and adds "TE: gzip" to requests removing the encoding from forwarded responses, will cause problems too - maybe even bigger ones - even though it's fully compliant and transparent according to spec. Not sure on that. There isn't many implementing TE: gzip today. I know. But spec isn't everything. Not short-term no, but when working with a timespan of several years it's important. The serious flaw is deployed. I'm not surprised - it's a predictable mistake given how HTTP systems are architected. When writing code you can't ignore the installed base of buggy agents if you want to interoperate. But as I've implied, that particular bug is found (as far as I know) only in old agents which are dwindling in presence, so you might choose to ignore it now, depending on how much you care about reaching those remaining. Personally I have very little respect for old broken user agents. Those generally have major gaping security flaws as well and really SHULD get upgraded. I care more about broken servers. Regards Henrik |
|
#15
|
|||
|
|||
|
i28 proposed replacement text
Henrik Nordstrom wrote:
HTTP evolution: Proxies for general HTTP use, such as at ISPs and gateways, should not be configured that way. And yet there is plenty doing this, especially in bandwidth scarse parts of the world, by looping all their traffic via a gzip proxy in a well connected co-location, and some major proxy vendors advertising it as a feature Interesting. I didn't realise those exist. It's quite understandable: cheap fast web browsing is important, and Transfer-Encoding is widely supported enough agents to be used. HTTP uses on those networks are relatively unimportant and will have to workaround those transformations. Fortunately it is quite straightforward and the least of one's worries - there's far more annoying HTTP-in-reality issues! I rather like the idea of proxies which use delta compression between each other. Infinitely more effective. But alas not zero-install for the end users. Besides, I bet a HTTP proxy which opportunistically applies "Transfer-Encoding: gzip" encoding when permitted, and adds "TE: gzip" to requests removing the encoding from forwarded responses, will cause problems too - maybe even bigger ones - even though it's fully compliant and transparent according to spec. Not sure on that. There isn't many implementing TE: gzip today. I know. But spec isn't everything. Not short-term no, but when working with a timespan of several years it's important. I agree, but I think the unfolding reality suggest ways the spec doesn't match the ways people want to actually use HTTP. We push sound principles but we work with the ecosystem too. Same reason we're writing applications in HTML + Javascript even though it's messy and slow. The serious flaw is deployed. I'm not surprised - it's a predictable mistake given how HTTP systems are architected. When writing code you can't ignore the installed base of buggy agents if you want to interoperate. But as I've implied, that particular bug is found (as far as I know) only in old agents which are dwindling in presence, so you might choose to ignore it now, depending on how much you care about reaching those remaining. Personally I have very little respect for old broken user agents. Those generally have major gaping security flaws as well and really SHULD get upgraded. I care more about broken servers. >From Google, I have the impression the problem is more prevalant on servers, if you're counting different programs (as opposed to number of running instances). But that it's also rare, occurring around the time people weren't really using compression and just starting to dabble in it. -- Jamie |
|
#16
|
|||
|
|||
|
i28 proposed replacement text
Jamie Lokier wrote:
And yet there is plenty doing this, especially in bandwidth scarse parts of the world, by looping all their traffic via a gzip proxy in a well connected co-location, and some major proxy vendors advertising it as a feature Interesting. I didn't realise those exist. It's quite understandable: cheap fast web browsing is important, and Transfer-Encoding is widely supported enough agents to be used. ^^ "in" I forgot "not" and "in". Cheers, -- Jamie |
|
#17
|
|||
|
|||
|
i28 proposed replacement text
Adrien de Croy wrote: Transfer-encoding and Content-Encoding are fundamentally different. It helps if you look at it from the point of view of who does the encoding. Agreed, but I think everyone who spoke in this thread knows the difference. The discussion hasn't denied that. I can see why it might look that way. Transfer-Encoding is performed on the fly by something in the stream (e.g. proxy or output conversion process). No, that's an implementation detail outside the scope of HTTP. It _can_ be implemented that way, but HTTP does not say anything about that or require it. In such cases it's often impossible (i.e. non-deterministic length of output of encoding) to know the length of the whole transformed entity. It's often impossible, but it's often possible. Agents can internally cache gzip transfer-encoded bodies and HTTP permits that (it says nothing about it). Content-Encoding is different because the sender should know the length, therefore can set Content-Length headers. No, that's another implementation detail outside the scope of HTTP. HTTP does not require the sender to know the length; it places no such requirement, not even in principle. In practice, many senders using Content-Encoding: gzip don't know the length when they start sending, and use chunked encoding or close the connection, which is allowed. It is deemed a separate entity - an attribute of which is an encoding, but as far as HTTP is concerned it may as well not be encoded. The encoding is meant for the end consumer of the message. That's all correct, but it has no bearing on how servers, clients |