Standards
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
User Name:
Password:
Remember me
Go Back   Web Development Archives Mailing Lists Standards

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Display Modes
 
Unread Web Development Archives Sponsor:
  #1  
Old July 6th, 2008, 01:20 PM
Bijan Parsia
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Jul 5, 2008, at 1:04 PM, Rob Shearer wrote:

I'm providing you with my experience: every user I've ever spoken
to about this topic has wanted the real number line.
They are used to using the xsd datatypes `float` and `double` to
represent number values, so they use these without values in WL
to mean "some number".
>>

>Do they mean bounded numbers? (i.e. with min and max sizes?) Do
>they distinguish between double and float? Do they care about
>NaNs? (Alan's users care about the latter.)
>

Whether it's "forall R 1.0^^xsd:float" or "forall R `xsd:float`"
they seem to intend a dense number line.

So you had user defined restrictions on floats, interesting.

In the first case `float` is just the easiest way to specify the
value; in the second you can certainly argue that they should have
used `decimal`but that's a pointless argument because my
reasoner didn't really support decimal.

That's interesting. I think part of what we need to is select a set
of sane datatypes to require. String, Integer, reals seem reasonable.

My experience is that the use of xsd datatypes as value spaces in
WL 1.0 causes users to write what they don't mean.
>>

>For me, this would suggest removing them or enforcing them more
>clearly.
>

I'd suggest removing them.

That's where I'm heading too.

My experience is that *every* ontology using `xsd:float` and
`xsd:double` without values would be better off using
`xsd:decimal`, but that the user intent was "some real
number" (and I should note that I'm against requiring support for
`xsd:decimal` values).
>>

>Values? the datatype? In WL 1, all these types were optional
>and poorly speced and had no documentation whatsoever. Part of the
>goal here is to spec well and document clearly any types we require.
>

I would like to use doubles internally to represent points on the
real number line.

For what lexical syntax?

Some homogeneous mix of internal representations is a pain. And I
seriously doubt that many users really care about the extra
representation power of `decimal`. It makes sense as an optional
feature reasoners can support, but it seems completely unnecessary
to require it in the spec's exactly the sort of thing I'd put
off implementing indefinitely under users asked for it.
>

The reason `decimal` keeps coming up is just that it's dense.

That's true. But there are several issues floating about, including
the possibility of interaction between floats and cardinality. It
seems to me that for most users, that will be a rare occurrence, even
accidently. It certainly requires ranges of floats (since it's
unlikely that the cardinalities required to cause a problem would be
feasible anyway). E.g., if we had unbounded binary numbers then such
floats would be no harder than integers.

So are we using the xsd spec as an excuse to conflate density with
complex internal representations?

I don't think so.

[snip]
Referring any user over to that spec to understand value spaces is
obnoxious and counter-productive:

We definitely don't intend to do that, I hope. Part of our current
effort is to make sure we carefully document the types we require and/
or sanction.

even WG members seem to be having trouble grokking it. (And bravo
to anyone making the pedantic point that a particular value is a
degenerate value space.)
>

I contend that WL users only want a tiny tiny number of different
value spaces to play with: integers, strings, and reals.

I certainly agree that these are key. I think the group agrees too.
The other types are something of a legacy.

It is possible, however, they they will want a larger number of
ways to lexically represent particular values within these three
spaces.

This wouldn't surprise me at all.

Most importantly, I do not think there is necessarily a direct
correlation between the lexical representations used to represent
particular values and the value spaces in which those particular
values live. I.e. users want to be able to specify particular
values within the `real` value space using `xsd:float`,

You mean the type name or the lexical syntax (e.g., "12.78e-2")? I'm
personally more comfortable with allowing the latter than pushing
"xsd:float" as a synonym for the real value space. Your milage
obviously varies.

but they do *not* have any interest in use of the `xsd:float` value
space.

Some do at least to the extent of wanting NaN (and perhaps -0). I'd
personally prefer not to shove them into the real type (certainly
NaN; I suppose we could make our reals the affine reals and handle
+inf).

Thus we've got two orthogonal concepts which happen to coincide for
strings and integers but not for real numbers.
>

My proposed solution would be to use brand-new WL names for all
value spaces, but use xsd syntax to specify particular values.

Could you say what you think the lexical space of the reals should
include? At least, as a first cut? (It seems decimal, scientific, and
rational notation would all be useful, the first two for common ways
of writing and the third for full coverage of the rationals.)

[snipped lots of useful details]

Thanks very much for those. I find them extremely helpful.

Thanks for the feedback.
>>

>Cheers,
>Bijan.
>

And if you're going to request further comment from a member of the
public, could you please do it on a list to which the public can
post? Shifting back to the WG list excludes me from comment.

D'oh! Sorry. That was an accident. My apologies.

(Which is fine if you don't address questions directly to me.)

Thanks again for the discussion.

Cheers,
Bijan.

Reply With Quote
  #2  
Old July 6th, 2008, 02:40 PM
Rob Shearer
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

>Most importantly, I do not think there is necessarily a direct
>correlation between the lexical representations used to represent
>particular values and the value spaces in which those particular
>values live. I.e. users want to be able to specify particular
>values within the `real` value space using `xsd:float`,
>

You mean the type name or the lexical syntax (e.g., "12.78e-2")?

XSD offers a lexical syntax for points that happen to lie on the real
number line's what I suggest using it for. The easiest approach
is that xsd names on their own are not valid "datatypes"; particular
values encoded using xsd, however, are (because particular values are
single-element value spaces).

I'm personally more comfortable with allowing the latter than
pushing "xsd:float" as a synonym for the real value space. Your
milage obviously varies.
>
>but they do *not* have any interest in use of the `xsd:float` value
>space.
>

Some do at least to the extent of wanting NaN (and perhaps -0). I'd
personally prefer not to shove them into the real type (certainly
NaN; I suppose we could make our reals the affine reals and handle
+inf).

I'd endorse including only one zero, but I agree there's an issue with
NaN. My principled stand is that it's inconsistent (a value space of
size zero), but I'd definitely want to analyze the use cases to see
who loses important functionality from that decision.

But my main point is that users have no interest in the "holes"
introduced by the xsd:float value space: providing them access to a
value space of numbers representable in float representation is not
useful, and could lead to lots of confusion, particularly if users
could easily use such a space "by accident". That's the situation
we've fallen into with floats in WL 1.0.

>Thus we've got two orthogonal concepts which happen to coincide for
>strings and integers but not for real numbers.
>>

>My proposed solution would be to use brand-new WL names for all
>value spaces, but use xsd syntax to specify particular values.
>

Could you say what you think the lexical space of the reals should
include?

I don't know what you mean by "lexical space of the reals". I don't
propose defining the reals lexically; I propose defining the value
space mathematically. But implementations should allow users to
specify particular points in that value space using the lexical
representations for `xsd:float` and `xsd:int` values. I expect most
implementations will also support points represented as `xsd:double`
and `xsd:long` as well. I do *not* think a conformant implementations
should have to deal with arbitrary points represented as `xsd:decimal`
(since the vast majority of users don't need the extra
representational power, and there is substantial implementation burden
and performance penalty for dealing with such values correctly).

At least, as a first cut? (It seems decimal, scientific, and
rational notation would all be useful, the first two for common ways
of writing and the third for full coverage of the rationals.)

The WG should consider that some implementations might allow lots of
xsd syntaxes but lose precision on some of them (allow use of
`xsd:decimal` in ontology files for user convenience, but convert them
to floats during parsing) a vocabulary for what it means to
"support" a numeric xsd type for particular values would be useful. My
big concern here is that an ontology will be developed and tested with
a reasoner with "full" `xsd:decimal` support but then when it's used
with an implementation with "imprecise" `xsd:decimal` support
everything goes pear-shaped. Spitting out warnings during parsing
isn't a great solution

And of course some implementations might offer additional value spaces
as well, but I'd like the spec to make it very clear that this is a
very different thing than the above. For one thing, I'd suggest
outlawing any use of names within the xsd namespace for value spaces,
even spaces implementors have added as extensions. "Support for
`xsd:decimal`" should mean `xsd:decimal` syntax for points on the real
number line and nothing else.

-rob

Reply With Quote
  #3  
Old July 6th, 2008, 05:20 PM
Rob Shearer
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

>XSD offers a lexical syntax for points that happen to lie on the
>real number line
>

It offers several and we're free to define one for owl:real. If we
use any decimal notation, we have exactness problems (e.g., 1/3),
but decimal is very user friendly. So, I was thinking that the valid
syntax for a real would be decimal floating points and ratios of
integers. We could include scientific notation as well.

Why on earth would the WL group come up with their own syntax for
encoding numbers? The XSchema guys have already done that, and people
have implemented parsers for their spec. If there's going to be a
syntax for rationals or algebraics, then that seems to be right up
their alley.

>But my main point is that users have no interest in the "holes"
>introduced by the xsd:float value space: providing them access to a
>value space of numbers representable in float representation is not
>useful, and could lead to lots of confusion, particularly if users
>could easily use such a space "by accident".
>

Well, you'll get exactness holes with binary or decimal notation,
regardless of density issues.

I thought I had made my proposal clear on this: the value space does
not have holes. The representations supported for particular values
are not sufficient to address all the points in that space, but the
space itself does *not* have holes.

>I don't know what you mean by "lexical space of the reals".
>

XSD datatypes have a lexical space (e.g., the syntax) and a value
space. You are suggesting, I thought, that we adopt a value space
that is the reals and something about using xsd syntax (i.e.,
lexical spaces) for the syntax.

For the syntax of particular values. I keep trying to stress that
values spaces should be kept separate from the syntax used for
particular values.

XSD offers exact syntax only for binary and decimals (I believe it's
exact for binary). I was wondering what sort of lexical space you
want.

XSD offers a well-defined mapping from lexical representation to IEEE
floats. XSD defines an *exact* value for each valid lexical
representaion. You may not like the way the mapping is defined
(because the value of "1.1e0^^xsd:float" on the real number line is
not equal to the value of "1.1^^xsd:decimal"), but there is no
imprecision whatsoever about what each string represents. I am
satisfied with the work the XSchema group did on floating-point
lexical representations.

>But implementations should allow users to specify particular points
>in that value space using the lexical representations for
>`xsd:float` and `xsd:int` values.
>

So you want a very broad lexical space for our real type, i.e., "1",
"1.0", and "12.78e-2".

No. I want `real` to be a value space with no lexical connotations.
I want to be able to specify a particular point in this value space
using a string such as "1.0e0^^xsd:float".
The XSD lexical forms are not "the lexical space for reals". There is
no such thing as "the lexical space for reals". There is such a thing
as "the space of lexical representations which a conformant
implementation must support for particular values in the real value
space", but this space is much smaller than the real value space.

If we want exactness for the rationals, we need either to allow
repeating (e.g., 0.333repeating) (usually done with a macron) or
fraction syntax (e.g., 1/3).

I don't intend to support exactness for rationals. A conformant
implementation should only be required to provide exact support for
`xsd:int` and `xsd:float` values.

>I expect most implementations will also support points represented
>as `xsd:double` and `xsd:long` as well.
>

You mean their syntax, i.e., their lexical space.

Supporting these syntaxes means that reasoners must also support
reasoning with the particular values representable in those syntaxes.
Support for additional syntaxes does not change the underlying
semantics of the real number line, but it might make implementation of
those semantics a bit harder.

(Sorry for using the XSD terminology, but I think it's a bit clearer
if we stick to it for the moment.)
>
>I
>do *not* think a conformant implementations should have to deal
>with arbitrary points represented as `xsd:decimal` (since the vast
>majority of users don't need the extra representational power, and
>there is substantial implementation burden and performance penalty
>for dealing with such values correctly).
>

Given that more and more languages (e.g., Java) now bundle a decimal
type with their core libraries, I'm not so clear on the first.

I'm not sure Java is an example of "more and more languages". In fact
it is the flagship "you only ever need one language" proposal. And
even in Java you have to program differently if you're going
to play with polymorphic numbers than you would if you stuck to ints
and floats.

I'd like to write a distributed WL reasoner in Erlang. But Javascript
and C are perhaps more persuasive counterexamples to your argument.

I'd like to hear more about the second.

The most efficient bignum and decimal libraries are an order of
magnitude slower than corresponding int and float calculations.
Hardware is good with ints and floats.

>a vocabulary for what it means to "support" a numeric xsd
>type for particular values would be useful.
>

This is what we're after. Anything we spec will be tightly specced.
At the moment, we only have required and optional as modalities of
support. I think supporting various levels of precision (or variant
mapping) would be quite hard to understand.

But presumably you're making clear that implementations which
implement some "optional" functionality, but do so in a way which
contradicts the optional semantics, are non-compliant. If so, then
specifying what support for additional lexical representations means
(i.e. exact) would make clear that a product which parsed
`xsd:decimal` but internally converted to floating point would not
"support `xsd:decimal`" by the terms of the WL 2.0 spec. The
implementors could always claim "partial support", however.

this model, users would just have to decide between integers and
reals. We could have quite a wide lexical space for reals (and even
for integers, i.e., allow 1.0 to mean the integer 1).

I'm getting really confused what you're talking about
appearing in XML and RDF WL 2.0 files should be typed; there's no
need at all to guess the type based on syntax.

And of course "1.0e0^^xsd:float" and "1^^xsd:integer" are exactly the
same point on the real number line.

But "0.1"^^xsd:float would not be required, but also we wouldn't
change the meaning along the lines you suggest (we'd just be silent
about it). It's fairly simple to migrate old ontologies to the new
one with a simple converter. If enough implementations did it
silently, that would be information for a future group.

No idea what this means. But I'm guessing I disagree with it.

-rob

Reply With Quote
  #4  
Old July 6th, 2008, 05:20 PM
Bijan Parsia
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

Jul 6, 2008, at 8:07 PM, Rob Shearer wrote:

Most importantly, I do not think there is necessarily a direct
correlation between the lexical representations used to represent
particular values and the value spaces in which those particular
values live. I.e. users want to be able to specify particular
values within the `real` value space using `xsd:float`,
>>

>You mean the type name or the lexical syntax (e.g., "12.78e-2")?
>

XSD offers a lexical syntax for points that happen to lie on the
real number line

It offers several and we're free to define one for owl:real. If we
use any decimal notation, we have exactness problems (e.g., 1/3), but
decimal is very user friendly. So, I was thinking that the valid
syntax for a real would be decimal floating points and ratios of
integers. We could include scientific notation as well.

's what I suggest using it for. The easiest approach is that
xsd names on their own are not valid "datatypes"; particular values
encoded using xsd, however, are (because particular values are
single-element value spaces).
>
>I'm personally more comfortable with allowing the latter than
>pushing "xsd:float" as a synonym for the real value space. Your
>milage obviously varies.
>>

but they do *not* have any interest in use of the `xsd:float`
value space.
>>

>Some do at least to the extent of wanting NaN (and perhaps -0).
>I'd personally prefer not to shove them into the real type
>(certainly NaN; I suppose we could make our reals the affine reals
>and handle +inf).
>

I'd endorse including only one zero, but I agree there's an issue
with NaN.

And the infinities, though we could always go for the affine real line.

My principled stand is that it's inconsistent (a value space of
size zero), but I'd definitely want to analyze the use cases to see
who loses important functionality from that decision.
>

But my main point is that users have no interest in the "holes"
introduced by the xsd:float value space: providing them access to a
value space of numbers representable in float representation is not
useful, and could lead to lots of confusion, particularly if users
could easily use such a space "by accident".

Well, you'll get exactness holes with binary or decimal notation,
regardless of density issues.

That's the situation we've fallen into with floats in WL 1.0.
>

Thus we've got two orthogonal concepts which happen to coincide
for strings and integers but not for real numbers.

My proposed solution would be to use brand-new WL names for all
value spaces, but use xsd syntax to specify particular values.
>>

>Could you say what you think the lexical space of the reals should
>include?
>

I don't know what you mean by "lexical space of the reals".

XSD datatypes have a lexical space (e.g., the syntax) and a value
space. You are suggesting, I thought, that we adopt a value space
that is the reals and something about using xsd syntax (i.e., lexical
spaces) for the syntax. XSD offers exact syntax only for binary and
decimals (I believe it's exact for binary). I was wondering what sort
of lexical space you want.

I don't propose defining the reals lexically;

Sure.

I propose defining the value space mathematically.

Well, of course. But that's what XSD does as well. The decimals are a
well defined mathematical set.

But implementations should allow users to specify particular points
in that value space using the lexical representations for
`xsd:float` and `xsd:int` values.

So you want a very broad lexical space for our real type, i.e., "1",
"1.0", and "12.78e-2". If we want exactness for the rationals, we
need either to allow repeating (e.g., 0.333repeating) (usually done
with a macron) or fraction syntax (e.g., 1/3).

I expect most implementations will also support points represented
as `xsd:double` and `xsd:long` as well.

You mean their syntax, i.e., their lexical space.

(Sorry for using the XSD terminology, but I think it's a bit clearer
if we stick to it for the moment.)

I
do *not* think a conformant implementations should have to deal
with arbitrary points represented as `xsd:decimal` (since the vast
majority of users don't need the extra representational power, and
there is substantial implementation burden and performance penalty
for dealing with such values correctly).

Given that more and more languages (e.g., Java) now bundle a decimal
type with their core libraries, I'm not so clear on the first. I'd
like to hear more about the second.

>At least, as a first cut? (It seems decimal, scientific, and
>rational notation would all be useful, the first two for common
>ways of writing and the third for full coverage of the rationals.)
>

The WG should consider that some implementations might allow lots
of xsd syntaxes but lose precision on some of them (allow use of
`xsd:decimal` in ontology files for user convenience, but convert
them to floats during parsing)

, this can cause quite serious interoperability problems.
Some I'm inclined against it on first blush.

a vocabulary for what it means to "support" a numeric xsd
type for particular values would be useful.

This is what we're after. Anything we spec will be tightly specced.
At the moment, we only have required and optional as modalities of
support. I think supporting various levels of precision (or variant
mapping) would be quite hard to understand.

My big concern here is that an ontology will be developed and
tested with a reasoner with "full" `xsd:decimal` support but then
when it's used with an implementation with "imprecise"
`xsd:decimal` support everything goes pear-shaped.

That would be bad :) There could be subtler problems if people mapped
decimal syntax to binary in variant ways (i.e., which float do you
take 0.1 to?)

Spitting out warnings during parsing isn't a great solution
>

And of course some implementations might offer additional value
spaces as well, but I'd like the spec to make it very clear that
this is a very different thing than the above. For one thing, I'd
suggest outlawing any use of names within the xsd namespace for
value spaces, even spaces implementors have added as extensions.
"Support for `xsd:decimal`" should mean `xsd:decimal` syntax for
points on the real number line and nothing else.\

This doesn't seem likely. Existing implementations already do
different things with different xsd types. It'll be very hard to get
buy in from the RDF community. It seems like a more likely strategy
is to fix a (required) set of WL types (or core types) which are
easy to understand and robust with respect to intuitive behavior, and
leave the more specialized types for future people to standardize.

this model, users would just have to decide between integers and
reals. We could have quite a wide lexical space for reals (and even
for integers, i.e., allow 1.0 to mean the integer 1). But
"0.1"^^xsd:float would not be required, but also we wouldn't change
the meaning along the lines you suggest (we'd just be silent about
it). It's fairly simple to migrate old ontologies to the new one with
a simple converter. If enough implementations did it silently, that
would be information for a future group.

Thanks again.

Cheers,
Bijan.

Reply With Quote
  #5  
Old July 7th, 2008, 07:20 AM
Rob Shearer
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

The XSchema guys have already done that, and people have
implemented parsers for their spec. If there's going to be a
syntax for rationals or algebraics, then that seems to be right
up their alley.

They don't seem interested, alas.
>>

>And I very much hope the WL WG takes that as a sign that they
>should be even less interested.
>

The reason (one memeber) gave (privately) is that they didn't think
that reals beyond decimals were necessary for a schema language. I
think we agree that they are for an ontology language. So, my
conclusion is the opposite of your hope.

Rational numbers, and linear equations, and n-ary data predicates, all
seem *much* more relevant to data representation and model checking
than satisfiability reasoning; these are systems people want to use to
store and compute particular values based on input, not to check
satisfiability. (The n-ary datatype use cases, for example, don't
offer much insight into how such a feature could be used to draw
valuable new inferences.) And yet the XSchema group data
representation and model-checking crowd that such notions
were far too ambitious for even them.

Again, I urge the WL working group to follow that example and focus
on the small set of features which will actually benefit users, and
make sure that they get those features right.

-rob

Reply With Quote
  #6  
Old July 7th, 2008, 09:20 AM
Rob Shearer
Guest
Dev Archives Newbie (0 - 499 posts)
 
Posts: n/a  
Time spent in forums:
Reputation Power:
ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum

>The integer data domain is a subset of the number data domain.
>There is absolutely no need for a float data domain. WL
>implementations should support particular values encoded using the
>`xsd:int` and `xsd:float` lexical representations. These values are
>all in the number domain.
>

This goes against existing implementation and use, wherein xsd:float
is disjoint from xsd:int.

Cerebra did not make them disjoint. Neither does KAN2. And testing
reveals that neither does FaCT The only reasoner I can find that
makes them disjoint is Pellet. This sheds a lot of light on your
definition of "existing implementation and use".

If you intend to attempt to enshrine bugs in Clark & Parsia products
in the WL standard, I suggest that you do it as a representative of
Clark & Parsia and not as a representative of Manchester, which has an
interest in FaCT

-rob

Reply With Quote
Reply

Viewing: Web Development Archives Mailing Lists Standards > ISSUE-126 (Revisit Datatypes): A new proposal for the real <-> float <-> double conundrum


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 6 hosted by Hostway