|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Why Microsoft's authoritative=true won't work and is a bad idea
Ian Hickson wrote:
If you would like the document to be processed as plain text, then there might not be a good answer for you, sorry. Your use case is incompatible with the use case of the many users who want to see feeds sent as text/plain handled as feeds. Enough people mislabel their feeds as text/plain that in practice documents labeled as text/plain are, in some browsers, sniffed for feeds before being treated as plain text. With the current text in HTML5, there's not only no "good answer" but no answer at all (except by telling users to configure their UAs to respect mime types). Sam's use case could be made compatible by making the response distinguishable from one sent by a misconfigured server. At this point it seems to me that you are simply not interested in that case. Is this correct? BR, Julian |
|
#2
|
|||
|
|||
|
Why Microsoft's authoritative=true won't work and is a bad idea
[sorry for the missing red thread in this message, please read it in
full before responding] ¥n, 2008-07-07 at 09:33 +0200, Julian Reschke wrote: The IETF HTTPbis working group has no mandate to do so. Thus it would need to be rechartered, or a new WG would have to start. And from a protool specification and common sense point of view it would be the wrong thing to officially allow sniffing even when the content-type is clearly specified. HTTP already specifies when sniffing is allowed or not. Major browser vendors have over time and by intent choosen to ignore this part of the specifications, and now their ignorance is coming back and biting them and their users. Does this mean that specifications should change to allow for these bugs to grow into a standard feature encouraging ignorance? It also seems that some noticeable players have lost faith, thinking that things won't improve over time and things will stay as bad or worse over time. An attitude I find a bit disturbing when working with specifications as it means nothing can be changed or fixed other than documenting how broken the current implementations is today, ending in the rationale that "UTF7 content sniffing is implemented by some, so it must be supported by everyone even if completely stupid and current specifications we all agreed to implement years ago says you MUST NT". Yee, it do take some years of effort before any result in these areas at all is seen, but it's certainly not impossible. I have been fighting some of these wars, and some hours per year over some years nagging the right people about something which bothers you can make a difference. Yes, in the end there will be some old minor sites no longer working well with newer browsers if sniffing is deprecated. But there will also be existing major sites working better, being able to use content types as intended instead of having to find ways around the browsers guessing game. HTTP intentionally does not specify how sniffing is to be implemented or evaluated. That's a client implementation detail as far as HTTP is concerned, and extra feature to be used when nothing else is known about the content. How is that possible? Using Microsoft's proposal or by using a separate header, for instance. If it wasn't for the Apache answer that if such extension gets commonly available then it will be set by default by Apache, and things would go back to square -1 by the reasoning applied earlier, with even more bits on the wire that nobody want's to trust because server admins is by definition not trustworthy to be willing to make their servers conform with requirements or in general completely ignorant if their content breaks for large parts of their user base because of this. My concern about the proposal or added header is the reverse. Yes, it will enable servers to tell next generation of clients to trust them, but on the downside it will give more slack to the proponents which thinks sniffing is the solution to how to deal with mislabelled content. It's not a real solution to the problem, in fact it encourages that bug to grow even bigger, just adding a workaround to be able to ask that bug to go and hide for a while. Well, the biggest vendor just put a proposal on the table that would make it possible to disable sniffing altogether. Maybe it would make sense to consider it seriously, instead of immediately stating "won't work"? It will work, at least temporarily until there is again sufficient amount of mislabelled content. The only real long term solution I see to this problem is for major browser vendors to gradually stop sniffing content even without this extension. Add "serer trust" levels similar to how cookies black/whitelisting is managed, enabling the browsers to learn (by user experience) which sites label their content proerly and which don't. A good start on this track is to add a visible indication when mislabelled content is detected, enabling users to see when there is something wrong without "destroying the web". Regards Henrik PGP SIGNATURE Version: GnuPG v1.4.7 (GNU/Linux) rQ2BF/qNXEI= =ZCHo PGP SIGNATURE |
|
#3
|
|||
|
|||
|
Why Microsoft's authoritative=true won't work and is a bad idea
There is no "URI group" -- there's a list of people subscribed to the
URI mailing list. That being said, I haven't seen *any* kind of consensus that RFC3986 should be changed. I've seen some discussion about whether RFC3987bis should expand on the "LEIRI" topic, and it seems Martin D was considering that input. It seems to me that the following facts are true: * The URI group/mailing list is not actively working to update or change the URI specs. * the last few weeks, it has become clear that the URI specs need to change for certain aspects of browser behavior and HTML to make sense and/or work right. * The current URI/URL/"HTTP URL"/IRI breakout is artificial and can/should be fixed in the URI spec. If what Julian says is correct (and I have no reason to doubt it), how do we get some traction on this issue? Who do we engage? Does it make sense, instead of trying to do the work of an active URI group within the HTML 5 spec (the "HTTP URL" initiative) for a number of us to get involved with getting an *active* URI group going and simply working within that framework on that issue? Yes, it might feel like "packing the court", but if the spec is in desperate need of some reality-based changes, and there is no *active* group willing or able to even consider changes, then I don't see any issue with it. J.Ja |
|
#4
|
|||
|
|||
|
URI/IRI vs HTML-URL, was: Why Microsoft's authoritative=true won't work and is a bad idea
Justin James wrote:
>There is no "URI group" -- there's a list of people subscribed to the >URI mailing list. That being said, I haven't seen *any* kind of >consensus that RFC3986 should be changed. I've seen some discussion >about whether RFC3987bis should expand on the "LEIRI" topic, and it >seems Martin D was considering that input. It seems to me that the following facts are true: * The URI group/mailing list is not actively working to update or change the URI specs. There is no URI working group. URI is a stable specification (full IETF standard), and there's no consensus that anything needs to be done with it with respect to "HTML URL". There are individuals (?) working on a revision of the IRI spec, including Martin D That revision may contain more information about what's currently called LEIRI (Legacy Extended IRI), but I don't think there's consensus about whether this is really good idea. Head over to the URI mailing list and discuss it, if you're interested. * the last few weeks, it has become clear that the URI specs need to change for certain aspects of browser behavior and HTML to make sense and/or work right. Nope. What has become clear is that HTML needs to handle a superset of what IRI allows, and also needs to special case IRI->URI conversion for query components. That can be done in a separate spec, defining a mapping from "HTTP URL" to IRI reference, and then letting the default URI/IRI rules apply. It's not yet clear whether the same is needed outside HTML. Still waiting for examples. * The current URI/URL/"HTTP URL"/IRI breakout is artificial and can/should be fixed in the URI spec. Not sure what you call "breakout", and what you want fixed. If what Julian says is correct (and I have no reason to doubt it), how do we get some traction on this issue? Who do we engage? Does it make sense, instead of trying to do the work of an active URI group within the HTML 5 spec (the "HTTP URL" initiative) for a number of us to get involved with getting an *active* URI group going and simply working within that framework on that issue? Yes, it might feel like "packing the court", but if the spec is in desperate need of some reality-based changes, and there is no *active* group willing or able to even consider changes, then I don't see any issue with it. I think HTML5 defining local rules for treatment of identifiers in HTML documents is fine. this is done by defining a mapping to IRI (which as far as I understand currently is not the case). *If* more specifications need the same kind of mapping (and that's still an "if" for me), it would make sense to extract these mapping rules into a separate spec. Should these specs live in W3C land, it would probably make sense to make this a W3C activity. BR, Julian |
|
#5
|
|||
|
|||
|
Why Microsoft's authoritative=true won't work and is a bad idea
Henrik Nordstrom wrote:
HTTP already specifies when sniffing is allowed or not. Major browser vendors have over time and by intent choosen to ignore this part of the specifications Indeed, though as far as I can tell all of them except IE did this in the face of the #1 most-commonly-used HTTP server having a "feature" which essentially forced them to do it if they were to have a hope of being compatible with commonly-used websites. That's for text/plain. Feed sniffing was more a matter of standalone feed readers ignoring Content-Type altogether and treating everything as a feed, which meant that there was zero incentive to label feeds as such. When browsers came to implement a feed reader, the status quo was that a large fraction of feeds (easily double-digit percentages) was mislabeled. and now their ignorance is coming back and biting them and their users. Excuse me? "Ignorance"? Everyone involved knew exactly what they were doing. There were just no good solutions; the small amount of sniffing added seemed like the least bad of a set of bad choices. Does this mean that specifications should change to allow for these bugs to grow into a standard feature encouraging ignorance? The specifications, the UAs, and the servers should change such that: 1) The UAs implement the specification. 2) The servers implement the specification. 3) The specification defines error-handling. 4) The ensemble is a stable equilibrium (Ideally no one has incentive to change behavior). 5) At no point in between here and there is a UA required to do something that would cause its users to stop using it (an obvious non-starter from a UA point of view). 6) At no point in between here and there is a server required to do something that would cause administrators to stop using it (also an obvious non-starter, I would think). I have no opinion as to what the final state should be, subject to the above constraints. It also seems that some noticeable players have lost faith, thinking that things won't improve over time and things will stay as bad or worse over time. That's an empirical observation of the last 10 years, for what it's worth, not just a "think". If you think the next 10 years will somehow be different, I'd love to know why. -Boris |
|
#6
|
|||
|
|||
|
Why Microsoft's authoritative=true won't work and is a bad idea
¥n, 2008-07-07 at 09:36 -0400, Justin James wrote:
* the last few weeks, it has become clear that the URI specs need to change for certain aspects of browser behavior and HTML to make sense and/or work right. Whats wrong with the HTTP URL specification that makes HTML not make sense or not work right? I know some cases where browsers behave oddly wrt Internet URLs in general (mainly http:// and ftp://), and in all cases so far they are not following specifications and would behave quite well if they did Regards Henrik PGP SIGNATURE Version: GnuPG v1.4.7 (GNU/Linux) + uR4t5y2mytU= =dXL/ PGP SIGNATURE |
|
#7
|
|||
|
|||
|
Why Microsoft's authoritative=true won't work and is a bad idea
Whats wrong with the HTTP URL specification that makes HTML not make
sense or not work right? I know some cases where browsers behave oddly wrt Internet URLs in general (mainly http:// and ftp://), and in all cases so far they are not following specifications and would behave quite well if they did Henrik - The problem with the concept of HTML specifying its own URLs, from my viewpoint, is that developers need one standard to follow, not 3 (URI, IRI, HTTP URL). All too often, once you get more than 2 competing "standards", none of are actually "standard" and enough will get enough traction so that they never die. I truly think that everyone would be better served if there was simply 1 "U|IR*" standard (it's really sad when a regex is the best way to refer to a group of things) that developers learn and understand. All of the debate on this list over having a "U|IR*" standard added to the HTML spec, in order to compensate for discrepancies between how U|IR*'s are commonly used in HTML, as opposed to the way the specs read, is further proof that the specs are broken. A simple summary of my thoughts: Any spec which is not properly followed by the majority of developers a majority of the time (where pertinent, of course) is not a "standard" and is a broken spec. Sometimes, it is broken outside of the spec itself, such as being sponsored or ratified by an unrecognized body. times it is broken within the spec, like 800 page specs describing a floor sweeping process or something. Sometimes it is just a marketing problem (like so many of the X* specs, like XHTML, XForms, XPath, and a zillion other X* specs which few people use). >From what I can tell, the W3C has very, very hard time producing specs which don't qualify as "broken" by that measure, and HTML is heading that list. Imagine if drive manufacturers followed the SATA spec as well as HTML authors followed the HTML spec. We'd still be using pen and paper. So we need to be asking ourselves, "what's wrong with HTML that no one follows it?" The answer is not *just* "browsers accept garbage". The answer also includes, "a spec so long and lengthy that only a select few people can understand it to the point where they can write valid HTML." In other words, HTML is broken from the inside. J.Ja |
|
#8
|
|||
|
|||
|
Why Microsoft's authoritative=true won't work and is a bad idea
¥n, 2008-07-07 at 14:21 -0400, Boris Zbarsky wrote:
Excuse me? "Ignorance"? Everyone involved knew exactly what they were doing. There were just no good solutions; the small amount of sniffing added seemed like the least bad of a set of bad choices. I obviously disagree, but that's my opinion. The specifications, the UAs, and the servers should change such that: I'll add 0) The specifications makes sense and unambious to implement 1) The UAs implement the specification. 2) The servers implement the specification. 3) The specification defines error-handling. 4) The ensemble is a stable equilibrium (Ideally no one has incentive to change behavior). 5) At no point in between here and there is a UA required to do something that would cause its users to stop using it (an obvious non-starter from a UA point of view). 6) At no point in between here and there is a server required to do something that would cause administrators to stop using it (also an obvious non-starter, I would think). Yes, with some reservations for 5 & 6. I do expect UAs and servers to be willing to correct bugs, even if correcting those bugs would cause some slight interoperability issues with other broken implementations at the benefit of enabling correct interoperability with correct implementations. Even if this results in some users shifting one way or another. I have no opinion as to what the final state should be, subject to the above constraints. I have some opinions, based on - Simplicity. - No second-guessing or non-obvious sideeffects. If something is said it is said and should be trusted to be correct. - Consistent. As few special cases as possible. That's an empirical observation of the last 10 years, for what it's worth, not just a "think". If you think the next 10 years will somehow be different, I'd love to know why. Been in this business for more than 10 years, and have not yet lost faith in the ability to work for a more standardized and predictable computing environment. But if standardisation discussions in general tend to focus on "making current broken implementations the standardized status and assuming all implementations will be broken in the same way" instead of what makes sense from a long term technical standard point of view then things will certainly spin in the direction of worse. Regards Henrik PGP SIGNATURE Version: GnuPG v1.4.7 (GNU/Linux) QkeToE= =agFT PGP SIGNATURE |
|
#9
|
|||
|
|||
|
Why Microsoft's authoritative=true won't work and is a bad idea
¥n, 2008-07-07 at 18:56 -0400, Justin James wrote:
The problem with the concept of HTML specifying its own URLs, from my viewpoint, is that developers need one standard to follow, not 3 (URI, IRI, HTTP URL). But I am still not aware of the problem which triggered this. I linger on the HTTP WG, not the HTML one and is therefore unaware of what problem HTTP URL/URI/IRI specifications cause for HTML. Any spec which is not properly followed by the majority of developers a majority of the time (where pertinent, of course) is not a "standard" and is a broken spec. There is a large grey zone there. But yes, if every implementer consider what the specs says in some area to be nonsense and implements something else than the specs says then the spec is most likely broken. But in quite many cases it's just poor choice of language making the intentions of the specification not so obvious If every implementer implements something else because what the specs says is correct but the will to try to interoperate with existing/older broken implementations is greater than the will to keep a sane implementation. And especially not when there is multiple such areas for historical reasons (which HTTP has it's noticeable share of with 3.5 generations in a less than a handful years) Sometimes, it is broken outside of the spec itself, such as being sponsored or ratified by an unrecognized body. implemented before the effects has been properly analyzed times it is broken within the spec, like 800 page specs describing a floor sweeping process or something. Yes and unfortunately many specifications is heading in that direction, growing uncontrollably large with huge amounts of legacy attached. But quite often it's better to clearly define the original intents using the original mechanisms and encourage compliance, than to reinvent the same things again only because most implementers got it wrong the first time. Sometimes it is just a marketing problem (like so many of the X* specs, like XHTML, XForms, XPath, and a zillion other X* specs which few people use). Heh From what I can tell, the W3C has very, very hard time producing specs which don't qualify as "broken" by that measure, and HTML is heading that list. Can't comment. HTML is not my main field, staying mostly in the area of protocols and bits. But I do still feel a significant gap between HTML (and related) specifications and user agent implementation, and quite different gaps depending on implementation But I still have faith that things will improve over time if one has a little patience, and coverge towards the specications instead of diverging even further apart. A really big problem is to how to get rid of legacy from earlier specifications whos design choices perhaps wasn't the best a feature gets into a standard and implemented in more than one implementation it's likely to stay for a considerable time even if it turned out to be a very bad idea. Things which is only implemented but not officially standardised, or only in the standards but never implemented is a while lot easier to change as you can always claim that one of the two is wrong/broken. Same for when implementations misread specifications, resulting in unintentional deviations from the specification, most often from not understanding the specification or how it applies to what they do. Such mistakes is often relatively easy to get corrected once the right people is made aware of the issue and why it's important to follow the specs. Regards Henrik PGP SIGNATURE Version: GnuPG v1.4.7 (GNU/Linux) / vP+Bn221BAQ= =pMWq PGP SIGNATURE |
|
#10
|
|||
|
|||
|
Why Microsoft's authoritative=true won't work and is a bad idea
Henrik Nordstrom wrote:
>Excuse me? "Ignorance"? Everyone involved knew exactly what they were doing. >There were just no good solutions; the small amount of sniffing added seemed >like the least bad of a set of bad choices. I obviously disagree, but that's my opinion. You're entitled to it, and I should clarify that the above only applies to the cases in which I've been able to see the reasoning process that led to the decisions (namely Gecko and Webkit). 0) The specifications makes sense and unambious to implement Assuming you meant "unambiguous", I agree. If you meant something else, what did you mean? >5) At no point in between here and there is a UA required to do something >that would cause its users to stop using it (an obvious non-starter >from a UA point of view). >6) At no point in between here and there is a server required to do >something that would cause administrators to stop using it (also an >obvious non-starter, I would think). Yes, with some reservations for 5 & 6. I do expect UAs and servers to be willing to correct bugs, even if correcting those bugs would cause some slight interoperability issues with other broken implementations at the benefit of enabling correct interoperability with correct implementations. Even if this results in some users shifting one way or another. So you're asking people to shoot themselves in the foot for the common good. While some may be willing to, in general that's a tough sell if the shooting is significant enough. Put another way, I can't think of a browser that would be willing to, say, sacrifice 5% of market share on this issue. I suspect sacrificing a single user is acceptable. The line is somewhere in between. - Simplicity. Which is nice if possible, of course. Are we talking simplicity of specification, of implementation, or of deployment? - No second-guessing or non-obvious sideeffects. If something is said it is said and should be trusted to be correct. This is nice to have, yes. - Consistent. As few special cases as possible. Again, this is nice to have. -Boris |
|
#11
|
|||
|
|||
|
Why Microsoft's authoritative=true won't work and is a bad idea
¥n, 2008-07-07 at 19:31 -0400, Boris Zbarsky wrote:
0) The specifications makes sense and unambious to implement Assuming you meant "unambiguous", I agree. I did. Always have a hard time spelling that word for some reason So you're asking people to shoot themselves in the foot for the common good. While some may be willing to, in general that's a tough sell if the shooting is significant enough. No I am not. Put another way, I can't think of a browser that would be willing to, say, sacrifice 5% of market share on this issue. I suspect sacrificing a single user is acceptable. The line is somewhere in between. Yes. The rule is that you sacrifice some share to gain another part and improve long term stability and reliability. - Simplicity. Which is nice if possible, of course. Are we talking simplicity of specification, of implementation, or of deployment? In this discussion at least specification and implementation. Usually goes hand in hand. Regards Henrik PGP SIGNATURE Version: GnuPG v1.4.7 (GNU/Linux) 4E5az6o5v3k= =T9tL PGP SIGNATURE |
|
#12
|
|||
|
|||
|
the "HTML URL" issue, was: Why Microsoft's authoritative=true won't work and is a bad idea
Am 08.07.2008 um 09:27 schrieb Julian Reschke:
The other issue that got a lot of discussion is whether the things used in HTML should be called "URL", when in reality they are something else. Calling them HREFs (even though they also appear in other attributes) would give everyone the right context (HTML) and topic (URLs) without the confusion of redefining existing terms. //Stefan -- <green/>bytes GmbH, Hafenweg 16, D-48155 M, Germany Amtsgericht M: HRB5782 |
|
#13
|
|||
|
|||
|
URI/IRI vs HTML-URL, was: Why Microsoft's authoritative=true won't work and is a bad idea
Martin Duerst wrote:
It may or may not need such a special case. The truth is that some years ago (less than 10), virtually all existing non-ASCII path information in (U/I)RIs had to be interpreted in the encoding of the containing page. This has changed, because people started to pick up on the idea of IRIs, more and more systems used UTF-8 on the server side, and at least some people understood that using the encoding of the containing page made it impossible to treat such identifiers free-standing. Also, a fallback for paths in legacy encodings is still availible (and was always available): %-encoding. As long as query URIs are interpreted based on the encoding of the containing page, they will stay useless without that context. I.e. they cannot (without further pain) be put into bookmark lists, they cannot be sent in email, and so on. The only sensible way to make this possible is to do the same as for the path part, namely use UTF-8 for the IRI->URI conversion. Freestanding (U/I)RIs with query parts may be less important than freestanding (U/I)RIs without query parts, but still, they are often convenient. However, they won't work if implemented the way HTML5 is currently describing them. Also, same as for path parts, a fallback for query parts in legacy encodings is still availible (and was always available): %-encoding. In summary, there are cases where things changed to the better in the last few years, and there are cases where some solutions make the Web work better than others. Note that HTML5 documents that carry aren't encoded in UTF-8 (or UTF-16) and which carry non-ASCII query parameters are currently non-conformant. (I personally don't think it makes a big difference in practice as HTML5 makes normatively defines their handling, so people will rely on that anyway). >That can be done in a separate spec, defining a mapping from "HTTP URL" to IRI reference, and then letting the default URI/IRI rules apply. I'm very much confused by "HTTP URL". In case that's the term that HTML5 currently uses, it should use a different one, to avoid confusion. Actually, I wanted to say "HTML URL" (URL as used in HTML5). HTML5 really uses just the term "URL". BR, Julian |
|
#14
|
|||
|
|||
|
URI/IRI vs HTML-URL, was: Why Microsoft's authoritative=true won't work and is a bad idea
Jul 8, 2008, at 9:13 AM, Martin Duerst wrote:
> As long as query URIs are interpreted based on the encoding of the containing page, they will stay useless without that context. I.e. they cannot (without further pain) be put into bookmark lists, they cannot be sent in email, and so on. The only sensible way to make this possible is to do the same as for the path part, namely use UTF-8 for the IRI->URI conversion. Freestanding (U/I)RIs with query parts may be less important than freestanding (U/I)RIs without query parts, but still, they are often convenient. However, they won't work if implemented the way HTML5 is currently describing them. Also, same as for path parts, a fallback for query parts in legacy encodings is still availible (and was always available): %-encoding. > Some implementations also break the fallback %-encoding by first trying to reinterpret the %-encoding within the current document encoding and then translating where appropriate. For example if the percent encoding represents a Unicode code point that maps to the current document encoding the implementation uses that translated bytecode instead of the literal percent encoded bytecode. I'm not sure whether this is an unfixable implementation error or whether we could use HTML5 to get these implementations back on track though. Jul 8, 2008, at 11:20 AM, Stefan Eissing wrote: > Am 08.07.2008 um 09:27 schrieb Julian Reschke: >The other issue that got a lot of discussion is whether the things >used in HTML should be called "URL", when in reality they are >something else. > Calling them HREFs (even though they also appear in other attributes) would give everyone the right context (HTML) and topic (URLs) without the confusion of redefining existing terms. From the relevant RFCs the term "URL reference" already exists and is the appropriate term for the value taken by the @href, @cite, @src and other attributes ("URI reference" or "IRI reference" might also make sense). Take care, Rob |
|
#15
|
|||
|
|||
|
the "HTML URL" issue, was: Why Microsoft's authoritative=true won't work and is a bad idea
The other issue that got a lot of discussion is whether the things
used in HTML should be called "URL", when in reality they are something else. Calling them HREFs (even though they also appear in other attributes) would give everyone the right context (HTML) and topic (URLs) without the confusion of redefining existing terms. Having nearly identical concepts is the root of this problem, not the nearly identical names (although that does not help either). There is no need to have a different spec for URI, IRI, and "HTTP URL", "URL reference", "HREF" (or whatever this mystery spec is being called). There should be *one* spec for resource locations. Period. Besides, defining resource locators is outside the domain of HTML as far as I am concerned. J.Ja |
|
#16
|
|||
|
|||
|
the "HTML URL" issue, was: Why Microsoft's authoritative=true won't work and is a bad idea
Am 08.07.2008 um 15:55 schrieb Justin James: Having nearly identical concepts is the root of this problem, not the nearly identical names (although that does not help either). There is no need to have a different spec for URI, IRI, and "HTTP URL", "URL reference", "HREF" (or whatever this mystery spec is being called). There should be *one* |