|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum
Jul 5, 2008, at 1:04 PM, Rob Shearer wrote:
I'm providing you with my experience: every user I've ever spoken to about this topic has wanted the real number line. They are used to using the xsd datatypes `float` and `double` to represent number values, so they use these without values in WL to mean "some number". >> >Do they mean bounded numbers? (i.e. with min and max sizes?) Do >they distinguish between double and float? Do they care about >NaNs? (Alan's users care about the latter.) > Whether it's "forall R 1.0^^xsd:float" or "forall R `xsd:float`" they seem to intend a dense number line. So you had user defined restrictions on floats, interesting. In the first case `float` is just the easiest way to specify the value; in the second you can certainly argue that they should have used `decimal`but that's a pointless argument because my reasoner didn't really support decimal. That's interesting. I think part of what we need to is select a set of sane datatypes to require. String, Integer, reals seem reasonable. My experience is that the use of xsd datatypes as value spaces in WL 1.0 causes users to write what they don't mean. >> >For me, this would suggest removing them or enforcing them more >clearly. > I'd suggest removing them. That's where I'm heading too. My experience is that *every* ontology using `xsd:float` and `xsd:double` without values would be better off using `xsd:decimal`, but that the user intent was "some real number" (and I should note that I'm against requiring support for `xsd:decimal` values). >> >Values? the datatype? In WL 1, all these types were optional >and poorly speced and had no documentation whatsoever. Part of the >goal here is to spec well and document clearly any types we require. > I would like to use doubles internally to represent points on the real number line. For what lexical syntax? Some homogeneous mix of internal representations is a pain. And I seriously doubt that many users really care about the extra representation power of `decimal`. It makes sense as an optional feature reasoners can support, but it seems completely unnecessary to require it in the spec's exactly the sort of thing I'd put off implementing indefinitely under users asked for it. > The reason `decimal` keeps coming up is just that it's dense. That's true. But there are several issues floating about, including the possibility of interaction between floats and cardinality. It seems to me that for most users, that will be a rare occurrence, even accidently. It certainly requires ranges of floats (since it's unlikely that the cardinalities required to cause a problem would be feasible anyway). E.g., if we had unbounded binary numbers then such floats would be no harder than integers. So are we using the xsd spec as an excuse to conflate density with complex internal representations? I don't think so. [snip] Referring any user over to that spec to understand value spaces is obnoxious and counter-productive: We definitely don't intend to do that, I hope. Part of our current effort is to make sure we carefully document the types we require and/ or sanction. even WG members seem to be having trouble grokking it. (And bravo to anyone making the pedantic point that a particular value is a degenerate value space.) > I contend that WL users only want a tiny tiny number of different value spaces to play with: integers, strings, and reals. I certainly agree that these are key. I think the group agrees too. The other types are something of a legacy. It is possible, however, they they will want a larger number of ways to lexically represent particular values within these three spaces. This wouldn't surprise me at all. Most importantly, I do not think there is necessarily a direct correlation between the lexical representations used to represent particular values and the value spaces in which those particular values live. I.e. users want to be able to specify particular values within the `real` value space using `xsd:float`, You mean the type name or the lexical syntax (e.g., "12.78e-2")? I'm personally more comfortable with allowing the latter than pushing "xsd:float" as a synonym for the real value space. Your milage obviously varies. but they do *not* have any interest in use of the `xsd:float` value space. Some do at least to the extent of wanting NaN (and perhaps -0). I'd personally prefer not to shove them into the real type (certainly NaN; I suppose we could make our reals the affine reals and handle +inf). Thus we've got two orthogonal concepts which happen to coincide for strings and integers but not for real numbers. > My proposed solution would be to use brand-new WL names for all value spaces, but use xsd syntax to specify particular values. Could you say what you think the lexical space of the reals should include? At least, as a first cut? (It seems decimal, scientific, and rational notation would all be useful, the first two for common ways of writing and the third for full coverage of the rationals.) [snipped lots of useful details] Thanks very much for those. I find them extremely helpful. Thanks for the feedback. >> >Cheers, >Bijan. > And if you're going to request further comment from a member of the public, could you please do it on a list to which the public can post? Shifting back to the WG list excludes me from comment. D'oh! Sorry. That was an accident. My apologies. (Which is fine if you don't address questions directly to me.) Thanks again for the discussion. Cheers, Bijan. |
|
#2
|
|||
|
|||
|
ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum
>Most importantly, I do not think there is necessarily a direct
>correlation between the lexical representations used to represent >particular values and the value spaces in which those particular >values live. I.e. users want to be able to specify particular >values within the `real` value space using `xsd:float`, > You mean the type name or the lexical syntax (e.g., "12.78e-2")? XSD offers a lexical syntax for points that happen to lie on the real number line's what I suggest using it for. The easiest approach is that xsd names on their own are not valid "datatypes"; particular values encoded using xsd, however, are (because particular values are single-element value spaces). I'm personally more comfortable with allowing the latter than pushing "xsd:float" as a synonym for the real value space. Your milage obviously varies. > >but they do *not* have any interest in use of the `xsd:float` value >space. > Some do at least to the extent of wanting NaN (and perhaps -0). I'd personally prefer not to shove them into the real type (certainly NaN; I suppose we could make our reals the affine reals and handle +inf). I'd endorse including only one zero, but I agree there's an issue with NaN. My principled stand is that it's inconsistent (a value space of size zero), but I'd definitely want to analyze the use cases to see who loses important functionality from that decision. But my main point is that users have no interest in the "holes" introduced by the xsd:float value space: providing them access to a value space of numbers representable in float representation is not useful, and could lead to lots of confusion, particularly if users could easily use such a space "by accident". That's the situation we've fallen into with floats in WL 1.0. >Thus we've got two orthogonal concepts which happen to coincide for >strings and integers but not for real numbers. >> >My proposed solution would be to use brand-new WL names for all >value spaces, but use xsd syntax to specify particular values. > Could you say what you think the lexical space of the reals should include? I don't know what you mean by "lexical space of the reals". I don't propose defining the reals lexically; I propose defining the value space mathematically. But implementations should allow users to specify particular points in that value space using the lexical representations for `xsd:float` and `xsd:int` values. I expect most implementations will also support points represented as `xsd:double` and `xsd:long` as well. I do *not* think a conformant implementations should have to deal with arbitrary points represented as `xsd:decimal` (since the vast majority of users don't need the extra representational power, and there is substantial implementation burden and performance penalty for dealing with such values correctly). At least, as a first cut? (It seems decimal, scientific, and rational notation would all be useful, the first two for common ways of writing and the third for full coverage of the rationals.) The WG should consider that some implementations might allow lots of xsd syntaxes but lose precision on some of them (allow use of `xsd:decimal` in ontology files for user convenience, but convert them to floats during parsing) a vocabulary for what it means to "support" a numeric xsd type for particular values would be useful. My big concern here is that an ontology will be developed and tested with a reasoner with "full" `xsd:decimal` support but then when it's used with an implementation with "imprecise" `xsd:decimal` support everything goes pear-shaped. Spitting out warnings during parsing isn't a great solution And of course some implementations might offer additional value spaces as well, but I'd like the spec to make it very clear that this is a very different thing than the above. For one thing, I'd suggest outlawing any use of names within the xsd namespace for value spaces, even spaces implementors have added as extensions. "Support for `xsd:decimal`" should mean `xsd:decimal` syntax for points on the real number line and nothing else. -rob |
|
#3
|
|||
|
|||
|
ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum
>XSD offers a lexical syntax for points that happen to lie on the
>real number line > It offers several and we're free to define one for owl:real. If we use any decimal notation, we have exactness problems (e.g., 1/3), but decimal is very user friendly. So, I was thinking that the valid syntax for a real would be decimal floating points and ratios of integers. We could include scientific notation as well. Why on earth would the WL group come up with their own syntax for encoding numbers? The XSchema guys have already done that, and people have implemented parsers for their spec. If there's going to be a syntax for rationals or algebraics, then that seems to be right up their alley. >But my main point is that users have no interest in the "holes" >introduced by the xsd:float value space: providing them access to a >value space of numbers representable in float representation is not >useful, and could lead to lots of confusion, particularly if users >could easily use such a space "by accident". > Well, you'll get exactness holes with binary or decimal notation, regardless of density issues. I thought I had made my proposal clear on this: the value space does not have holes. The representations supported for particular values are not sufficient to address all the points in that space, but the space itself does *not* have holes. >I don't know what you mean by "lexical space of the reals". > XSD datatypes have a lexical space (e.g., the syntax) and a value space. You are suggesting, I thought, that we adopt a value space that is the reals and something about using xsd syntax (i.e., lexical spaces) for the syntax. For the syntax of particular values. I keep trying to stress that values spaces should be kept separate from the syntax used for particular values. XSD offers exact syntax only for binary and decimals (I believe it's exact for binary). I was wondering what sort of lexical space you want. XSD offers a well-defined mapping from lexical representation to IEEE floats. XSD defines an *exact* value for each valid lexical representaion. You may not like the way the mapping is defined (because the value of "1.1e0^^xsd:float" on the real number line is not equal to the value of "1.1^^xsd:decimal"), but there is no imprecision whatsoever about what each string represents. I am satisfied with the work the XSchema group did on floating-point lexical representations. >But implementations should allow users to specify particular points >in that value space using the lexical representations for >`xsd:float` and `xsd:int` values. > So you want a very broad lexical space for our real type, i.e., "1", "1.0", and "12.78e-2". No. I want `real` to be a value space with no lexical connotations. I want to be able to specify a particular point in this value space using a string such as "1.0e0^^xsd:float". The XSD lexical forms are not "the lexical space for reals". There is no such thing as "the lexical space for reals". There is such a thing as "the space of lexical representations which a conformant implementation must support for particular values in the real value space", but this space is much smaller than the real value space. If we want exactness for the rationals, we need either to allow repeating (e.g., 0.333repeating) (usually done with a macron) or fraction syntax (e.g., 1/3). I don't intend to support exactness for rationals. A conformant implementation should only be required to provide exact support for `xsd:int` and `xsd:float` values. >I expect most implementations will also support points represented >as `xsd:double` and `xsd:long` as well. > You mean their syntax, i.e., their lexical space. Supporting these syntaxes means that reasoners must also support reasoning with the particular values representable in those syntaxes. Support for additional syntaxes does not change the underlying semantics of the real number line, but it might make implementation of those semantics a bit harder. (Sorry for using the XSD terminology, but I think it's a bit clearer if we stick to it for the moment.) > >I >do *not* think a conformant implementations should have to deal >with arbitrary points represented as `xsd:decimal` (since the vast >majority of users don't need the extra representational power, and >there is substantial implementation burden and performance penalty >for dealing with such values correctly). > Given that more and more languages (e.g., Java) now bundle a decimal type with their core libraries, I'm not so clear on the first. I'm not sure Java is an example of "more and more languages". In fact it is the flagship "you only ever need one language" proposal. And even in Java you have to program differently if you're going to play with polymorphic numbers than you would if you stuck to ints and floats. I'd like to write a distributed WL reasoner in Erlang. But Javascript and C are perhaps more persuasive counterexamples to your argument. I'd like to hear more about the second. The most efficient bignum and decimal libraries are an order of magnitude slower than corresponding int and float calculations. Hardware is good with ints and floats. >a vocabulary for what it means to "support" a numeric xsd >type for particular values would be useful. > This is what we're after. Anything we spec will be tightly specced. At the moment, we only have required and optional as modalities of support. I think supporting various levels of precision (or variant mapping) would be quite hard to understand. But presumably you're making clear that implementations which implement some "optional" functionality, but do so in a way which contradicts the optional semantics, are non-compliant. If so, then specifying what support for additional lexical representations means (i.e. exact) would make clear that a product which parsed `xsd:decimal` but internally converted to floating point would not "support `xsd:decimal`" by the terms of the WL 2.0 spec. The implementors could always claim "partial support", however. this model, users would just have to decide between integers and reals. We could have quite a wide lexical space for reals (and even for integers, i.e., allow 1.0 to mean the integer 1). I'm getting really confused what you're talking about appearing in XML and RDF WL 2.0 files should be typed; there's no need at all to guess the type based on syntax. And of course "1.0e0^^xsd:float" and "1^^xsd:integer" are exactly the same point on the real number line. But "0.1"^^xsd:float would not be required, but also we wouldn't change the meaning along the lines you suggest (we'd just be silent about it). It's fairly simple to migrate old ontologies to the new one with a simple converter. If enough implementations did it silently, that would be information for a future group. No idea what this means. But I'm guessing I disagree with it. -rob |
|
#4
|
|||
|
|||
|
ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum
Jul 6, 2008, at 8:07 PM, Rob Shearer wrote:
Most importantly, I do not think there is necessarily a direct correlation between the lexical representations used to represent particular values and the value spaces in which those particular values live. I.e. users want to be able to specify particular values within the `real` value space using `xsd:float`, >> >You mean the type name or the lexical syntax (e.g., "12.78e-2")? > XSD offers a lexical syntax for points that happen to lie on the real number line It offers several and we're free to define one for owl:real. If we use any decimal notation, we have exactness problems (e.g., 1/3), but decimal is very user friendly. So, I was thinking that the valid syntax for a real would be decimal floating points and ratios of integers. We could include scientific notation as well. 's what I suggest using it for. The easiest approach is that xsd names on their own are not valid "datatypes"; particular values encoded using xsd, however, are (because particular values are single-element value spaces). > >I'm personally more comfortable with allowing the latter than >pushing "xsd:float" as a synonym for the real value space. Your >milage obviously varies. >> but they do *not* have any interest in use of the `xsd:float` value space. >> >Some do at least to the extent of wanting NaN (and perhaps -0). >I'd personally prefer not to shove them into the real type >(certainly NaN; I suppose we could make our reals the affine reals >and handle +inf). > I'd endorse including only one zero, but I agree there's an issue with NaN. And the infinities, though we could always go for the affine real line. My principled stand is that it's inconsistent (a value space of size zero), but I'd definitely want to analyze the use cases to see who loses important functionality from that decision. > But my main point is that users have no interest in the "holes" introduced by the xsd:float value space: providing them access to a value space of numbers representable in float representation is not useful, and could lead to lots of confusion, particularly if users could easily use such a space "by accident". Well, you'll get exactness holes with binary or decimal notation, regardless of density issues. That's the situation we've fallen into with floats in WL 1.0. > Thus we've got two orthogonal concepts which happen to coincide for strings and integers but not for real numbers. My proposed solution would be to use brand-new WL names for all value spaces, but use xsd syntax to specify particular values. >> >Could you say what you think the lexical space of the reals should >include? > I don't know what you mean by "lexical space of the reals". XSD datatypes have a lexical space (e.g., the syntax) and a value space. You are suggesting, I thought, that we adopt a value space that is the reals and something about using xsd syntax (i.e., lexical spaces) for the syntax. XSD offers exact syntax only for binary and decimals (I believe it's exact for binary). I was wondering what sort of lexical space you want. I don't propose defining the reals lexically; Sure. I propose defining the value space mathematically. Well, of course. But that's what XSD does as well. The decimals are a well defined mathematical set. But implementations should allow users to specify particular points in that value space using the lexical representations for `xsd:float` and `xsd:int` values. So you want a very broad lexical space for our real type, i.e., "1", "1.0", and "12.78e-2". If we want exactness for the rationals, we need either to allow repeating (e.g., 0.333repeating) (usually done with a macron) or fraction syntax (e.g., 1/3). I expect most implementations will also support points represented as `xsd:double` and `xsd:long` as well. You mean their syntax, i.e., their lexical space. (Sorry for using the XSD terminology, but I think it's a bit clearer if we stick to it for the moment.) I do *not* think a conformant implementations should have to deal with arbitrary points represented as `xsd:decimal` (since the vast majority of users don't need the extra representational power, and there is substantial implementation burden and performance penalty for dealing with such values correctly). Given that more and more languages (e.g., Java) now bundle a decimal type with their core libraries, I'm not so clear on the first. I'd like to hear more about the second. >At least, as a first cut? (It seems decimal, scientific, and >rational notation would all be useful, the first two for common >ways of writing and the third for full coverage of the rationals.) > The WG should consider that some implementations might allow lots of xsd syntaxes but lose precision on some of them (allow use of `xsd:decimal` in ontology files for user convenience, but convert them to floats during parsing) , this can cause quite serious interoperability problems. Some I'm inclined against it on first blush. a vocabulary for what it means to "support" a numeric xsd type for particular values would be useful. This is what we're after. Anything we spec will be tightly specced. At the moment, we only have required and optional as modalities of support. I think supporting various levels of precision (or variant mapping) would be quite hard to understand. My big concern here is that an ontology will be developed and tested with a reasoner with "full" `xsd:decimal` support but then when it's used with an implementation with "imprecise" `xsd:decimal` support everything goes pear-shaped. That would be bad :) There could be subtler problems if people mapped decimal syntax to binary in variant ways (i.e., which float do you take 0.1 to?) Spitting out warnings during parsing isn't a great solution > And of course some implementations might offer additional value spaces as well, but I'd like the spec to make it very clear that this is a very different thing than the above. For one thing, I'd suggest outlawing any use of names within the xsd namespace for value spaces, even spaces implementors have added as extensions. "Support for `xsd:decimal`" should mean `xsd:decimal` syntax for points on the real number line and nothing else.\ This doesn't seem likely. Existing implementations already do different things with different xsd types. It'll be very hard to get buy in from the RDF community. It seems like a more likely strategy is to fix a (required) set of WL types (or core types) which are easy to understand and robust with respect to intuitive behavior, and leave the more specialized types for future people to standardize. this model, users would just have to decide between integers and reals. We could have quite a wide lexical space for reals (and even for integers, i.e., allow 1.0 to mean the integer 1). But "0.1"^^xsd:float would not be required, but also we wouldn't change the meaning along the lines you suggest (we'd just be silent about it). It's fairly simple to migrate old ontologies to the new one with a simple converter. If enough implementations did it silently, that would be information for a future group. Thanks again. Cheers, Bijan. |
|
#5
|
|||
|
|||
|
ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum
The XSchema guys have already done that, and people have
implemented parsers for their spec. If there's going to be a syntax for rationals or algebraics, then that seems to be right up their alley. They don't seem interested, alas. >> >And I very much hope the WL WG takes that as a sign that they >should be even less interested. > The reason (one memeber) gave (privately) is that they didn't think that reals beyond decimals were necessary for a schema language. I think we agree that they are for an ontology language. So, my conclusion is the opposite of your hope. Rational numbers, and linear equations, and n-ary data predicates, all seem *much* more relevant to data representation and model checking than satisfiability reasoning; these are systems people want to use to store and compute particular values based on input, not to check satisfiability. (The n-ary datatype use cases, for example, don't offer much insight into how such a feature could be used to draw valuable new inferences.) And yet the XSchema group data representation and model-checking crowd that such notions were far too ambitious for even them. Again, I urge the WL working group to follow that example and focus on the small set of features which will actually benefit users, and make sure that they get those features right. -rob |
|
#6
|
|||
|
|||
|
ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum
>The integer data domain is a subset of the number data domain.
>There is absolutely no need for a float data domain. WL >implementations should support particular values encoded using the >`xsd:int` and `xsd:float` lexical representations. These values are >all in the number domain. > This goes against existing implementation and use, wherein xsd:float is disjoint from xsd:int. Cerebra did not make them disjoint. Neither does KAN2. And testing reveals that neither does FaCT The only reasoner I can find that makes them disjoint is Pellet. This sheds a lot of light on your definition of "existing implementation and use". If you intend to attempt to enshrine bugs in Clark & Parsia products in the WL standard, I suggest that you do it as a representative of Clark & Parsia and not as a representative of Manchester, which has an interest in FaCT -rob |
![]() |
| Viewing: Web Development Archives > Mailing Lists > Standards > ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|