# Thread: Precision of Double variables for positive integers

1. ## Precision of Double variables for positive integers

In a recent thread, I wanted to talk about the capacity of Double variables to store positive integers (in comparison with Long ones), and got a bit confused by what was written in the documentation.

For Long variables, the situation seems straightforward. As I understand it, a positive integer is then effectively stored in 63 bits, so that the largest positive integer which can be stored is 2^63, namely 9,223,372,036,854,775,808, and this corresponds exactly with what the documentation says …

Originally Posted by the documentation
-9,223,372,036,854,775,808 through 9,223,372,036,854,775,807 (9.2...E+18) (signed)
In the case of a Double variable, my understanding is that a positive integer will be effectively (‘approximately’) stored in 52 bits of the mantissa/significant part of the storage, such that the largest positive integer which I would expect to be able to stored precisely would be about 2^52, namely 4,503,599,627,370,496 (i.e. 16 significant figures, up to the ceiling). However, the documentation says (of Double variables) …

Originally Posted by the documentation
4.94065645841246544E-324 through 1.79769313486231570E+308 for positive values
This seems to be implying that the mantissa/significant can store 18 significant figures, two more than I would have expected.

Can someone help me to understand this apparent discrepancy/anomaly?

In an attempt to investigate this matter, interestingly, if I assign a positive integer value to a Double variable which has 18 significant figures, such as 912345678901234567, then if I display the value of that variable in, say, a textbox, I see just 15 significant figures (“9.12345678901235E+17”), the least significant one having been ‘rounded up’, and if I format it to show 18 significant figures, I still just get the same 15, followed by 3 trailing zeros (“912345678901235000”). However, if I use CLng() to convert the value to Long, and then display that value in a textbox, I see all 18 figures of the value which what was originally assigned to the Double variable, albeit the least significant figure is incorrect in terms of what was assigned (“912345678901234560”).

Whilst the precision I’m getting (for positive integers) with Double is roughly what I would have expected, can anyone help me to understand the above, and the documentation?

Just for the record, this code …

Code:
```        Dim x As Double = 912345678901234567
Dim y As Long = CLng(x)

txt1.Text = CStr(x)
txt2.Text = Format(x, "##################")
txt3.Text = CStr(y)```
… produced the following in the three textboxes…

txt1: 9.12345678901235E+17
txt1: 912345678901235000
txt1: 912345678901234560

Kind Regards, John

2. ## Re: Precision of Double variables for positive integers

Doubles are floating point values. They're not useful for precision numbers. Decimal will store your values intact, or have a look at BigInteger

3. ## Re: Precision of Double variables for positive integers

Originally Posted by .paul.
Doubles are floating point values. They're not useful for precision numbers. Decimal will store your values intact, or have a look at BigInteger
Yes, I understand all that, and I'm not wanting to store very large numbers.

In the context in which this arose, we are stuck with Double, since that's what the property in question was returning - and I can see no reason why a floating-point variable should not be able accurately store integers which can be accommodated in the mantissa/significand part of its storage (with an exponent of zero).

As I said, the behaviour I observed empirically was more-or-less as I had expected, but it's the documentation that I found difficult to understand - how can a floating point variable store a number with 18 decimal digits when only ~52 bits are available for its storage? ... or am I missing something?

Kind Regards, John

4. ## Re: Precision of Double variables for positive integers

Originally Posted by JohnW2
Yes, I understand all that, and I'm not wanting to store very large numbers.

In the context in which this arose, we are stuck with Double, since that's what the property in question was returning - and I can see no reason why a floating-point variable should not be able accurately store integers which can be accommodated in the mantissa/significand part of its storage (with an exponent of zero).

As I said, the behaviour I observed empirically was more-or-less as I had expected, but it's the documentation that I found difficult to understand - how can a floating point variable store a number with 18 decimal digits when only ~52 bits are available for its storage? ... or am I missing something?

Kind Regards, John
A Long data type will store exact numbers, a Double gets the increase in size by reducing precision - numbers may not be exact.

5. ## Re: Precision of Double variables for positive integers

Well if it’s just a theoretical question, and you’re looking for clarification, maybe someone else can help you with that...

6. ## Re: Precision of Double variables for positive integers

Originally Posted by PlausiblyDamp
A Long data type will store exact numbers, a Double gets the increase in size by reducing precision - numbers may not be exact.
May not? Almost definitely will not in my experience

7. ## Re: Precision of Double variables for positive integers

Originally Posted by .paul.
May not? Almost definitely will not in my experience
In fact
Code:
```    Sub Main(args As String())

Dim d1 As Double = 10 ^ 250
Dim d2 As Double = (10 ^ 250) + 1

Console.WriteLine(d1 = d2)
End Sub```
proves exactly that

8. ## Re: Precision of Double variables for positive integers

You may find this link to be of interest:

IEEE Floating Point

I've never really bothered to get into it all that much. Double is faster than Decimal, so it is the floating point mechanism by default. Decimal has more precision, so should be used for currency, and wherever else necessary. When it comes to integers stored in floating point variables...it's just a matter of how the type works, which the link explains to a pretty good degree.

9. ## Re: Precision of Double variables for positive integers

Originally Posted by PlausiblyDamp
A Long data type will store exact numbers, a Double gets the increase in size by reducing precision - numbers may not be exact.
It's not just a question of precision or exactness - it's about how, as the documentation seems to imply, one can get a number with 10 decimal digits out of 52 bits of binary storage.

Kind Regards, John

10. ## Re: Precision of Double variables for positive integers

Originally Posted by Shaggy Hiker
You may find this link to be of interest: IEEE Floating Point
Yes, I fully understand IEEE floating point representation, and that is the very basis of my point. That article confirms, as I have said, that there are 52 bits for the mantissa in a 64-bit double-precision IEEE representation - so I have to again ask, how has the vb.net documentation got a number with 18 decimal digits out of 52 bits of binary storage?

Kind Regards, John

11. ## Re: Precision of Double variables for positive integers

Originally Posted by PlausiblyDamp
In fact
Code:
```    Sub Main(args As String())

Dim d1 As Double = 10 ^ 250
Dim d2 As Double = (10 ^ 250) + 1

Console.WriteLine(d1 = d2)
End Sub```
proves exactly that
That's not surprising - we know that floating-point operations can be imprecise when one is not dealing with an integer that can be fully accommodated within the mantissa part of the storage. However, the latter is the situation I'm talking about - try it again with 10 ^ 15 or less.

Kind Regards, John

12. ## Re: Precision of Double variables for positive integers

Originally Posted by .paul.
Well if it’s just a theoretical question, and you’re looking for clarification, maybe someone else can help you with that...
It's not a theoretical or hypothetical question. It's an attempt on my part to understand why/how the documentation says something had me misled for a good while until I thought things through and realised that what it said didn't seem to make sense. As you say, maybe someone can help me gain that understanding.

Kind Regards, John

13. ## Re: Precision of Double variables for positive integers

Originally Posted by JohnW2
It's not a theoretical or hypothetical question. It's an attempt on my part to understand why/how the documentation says something had me misled for a good while until I thought things through and realised that what it said didn't seem to make sense. As you say, maybe someone can help me gain that understanding.

Kind Regards, John
You might’ve found an error in the documentation. It’s not unheard of...

To err is human, but to really %#\$? things up requires a computer��

14. ## Re: Precision of Double variables for positive integers

Originally Posted by .paul.
You might’ve found an error in the documentation. It’s not unheard of...
Indeed so. However, as you will presumably understand, I'm trying to get some reassurance that it is an error in the documentation, rather than something I don't understand properly.

I initially 'made the mistake' of simply counting the number of decimal digits indicated in the documentation's specification of the range of Double variables. It was only significantly later (after probably having written some incorrect things in posts as a result of what I'd read) that I stopped and thought, and realised that the number I'd counted did not appear to make sense. However, as I said, I'm not discounting the possibility that I don't understand properly, or am not thinking straight!

Kind Regards, John

15. ## Re: Precision of Double variables for positive integers

OK, I guess I'm not quite clear on the question. In your initial post, you showed the documentation showing one thing, but that one thing wasn't necessarily showing an integer. Once you get away from integers, the whole argument becomes kind of moot, because you then get into the question of how well any system of numbers can display certain values. There are always fractions that can't be cleanly displayed, depending on the number system.

However, a computer will also have a display. So, you show this from the documentation: 1.79769313486231570E+308
Are you assuming that all the digits in the mantissa are meaningful? I would guess that a bunch of what appears to be the mantissa, in this case, is really just a display artifact. They're taking a base 10 fraction, packing it into a binary number by some means, then taking it back to a base 10 value for display. I can't say I have ever looked at it, and the exercise seems purely academic, but it seems likely to be a display artifact resulting from the double conversion and printing out N digits of a number that lacks N digits of precision, and thus the display is essentially garbage past some digit much smaller than N.

16. ## Re: Precision of Double variables for positive integers

Originally Posted by Shaggy Hiker
OK, I guess I'm not quite clear on the question. In your initial post, you showed the documentation showing one thing, but that one thing wasn't necessarily showing an integer.
With respect, that's not the point. The documentation was showing a number (well, two numbers) which had 18 decimal digits in their mantissas, yet, with the IEEE format, there is only space to store mantissas up to 2^52 (around 15 decimal digits).

Originally Posted by Shaggy Hiker
So, you show this from the documentation: 1.79769313486231570E+308. Are you assuming that all the digits in the mantissa are meaningful?
No, not necessarily meaningful (indeed, the final ones presumably aren't) - but I'm assuming that they must be stored somewhere - and, as above, there simply is not enough space in the IEEE format to store that many digits.

.Net clearly understands that since, even if one assigns an integer with 18 decimal digits to a Double variable, when asked to display it, .net only displays a mantissa of 15 digits (which, as I've said, is the most that the IEEE format can store).

Originally Posted by Shaggy Hiker
... I can't say I have ever looked at it, and the exercise seems purely academic, but it seems likely to be a display artifact resulting from the double conversion and printing out N digits of a number that lacks N digits of precision, and thus the display is essentially garbage past some digit much smaller than N.
As above, I'm not talking about precision, conversions or display artefacts - I'm simply talking about the fact that the documentation is showing Double variable values with mantissas that have more decimal digits than can possibly be stored in the mantissa part of a 64-bit Double variable.

I was going to suggest that you tried it by assigning some values with a large number of decimal digits to a Double variable and then see how it gets displayed - but I find that vb.net will not let one do that. If one types lots of digits, it seems that it just chops off all those in excess of the number it can store in the mantissa of a Double the moment one moves away from the line of typing!

Kind Regards, John

17. ## Re: Precision of Double variables for positive integers

Originally Posted by JohnW2

I was going to suggest that you tried it by assigning some values with a large number of decimal digits to a Double variable and then see how it gets displayed - but I find that vb.net will not let one do that. If one types lots of digits, it seems that it just chops off all those in excess of the number it can store in the mantissa of a Double the moment one moves away from the line of typing!

Kind Regards, John
I was going to say that was kind of my point, but then I did a test and realized I was wrong about that. The values in the documentation happen to be what you get when you do this:

Double.MaxValue.ToString

So, the documentation is correct...ish, as the value shown in the documentation is the value the code returns. However, that does seem to be at odds with the number of bits in the mantissa...if the Double type just follows the IEEE spec, which I don't know whether it does.

I'm not sure that's quite correct, though. I'm thinking of the mantissa as a number, with the exponent being a different number. That's how it is displayed, but I don't think that's correct, because the exponent doesn't work all that well, either. Therefore, I assume that what is displayed as the mantissa is not what is found in the 51 bits, but is a longer set of digits that you get when a binary representation of a decimal number is stored in the IEEE format, then gets turned back into decimal. The details of that, I've never looked into, though.

18. ## Re: Precision of Double variables for positive integers

Originally Posted by Shaggy Hiker
The values in the documentation happen to be what you get when you do this:
Double.MaxValue.ToString
So, the documentation is correct...ish, as the value shown in the documentation is the value the code returns.
Well, for a start, not for me - if I push that into a textbox, what I get is ...

1.79769313486232E+308

... which is the 16 decimal mantissa digits I would roughly expect, missing off the last three digits of what's in the documentation (and rounding up the final remaining digit).

However, even if it were identical to what is in the documentation, I don't think that would prove much. I'm sure that the value that Double.MaxValue returns is not determined/calculated when it is asked for - it will simply be a constant programmed into the software - so I would suggest that either an incorrect constant in software has been copied into the documentation or an incorrect figure in the documentation has been programmed into the software as a constant!

Originally Posted by Shaggy Hiker
However, that does seem to be at odds with the number of bits in the mantissa...if the Double type just follows the IEEE spec, which I don't know whether it does. I'm not sure that's quite correct, though. I'm thinking of the mantissa as a number, with the exponent being a different number. That's how it is displayed, but I don't think that's correct, because the exponent doesn't work all that well, either. Therefore, I assume that what is displayed as the mantissa is not what is found in the 51 bits, but is a longer set of digits that you get when a binary representation of a decimal number is stored in the IEEE format, then gets turned back into decimal. The details of that, I've never looked into, though.
It is, of course, possible that .net does not follow the IEEE spec, but that would seem extremely unlikely - and, as for what you suggest above, it goes against everything I've ever been taught and learned about the IEEE spec. My understanding has always been that the mantissa and exponent parts are totally separate, of fixed length and each do 'just what it says on their tins'.

In any event, one only has to be able to count to understand that, no matter how things are arranged within the 64 bits, what is presented in the documentation is not possible for any 64-bit variable.

The decimal digits of the mantissa the documentation gives are: 179769313486231570 ... which when converted to binary become:

1001111110101010110011110011110111110111001101100000010010

... which is 58 bits. The digits of the exponent the documentation gives are 308, which when converted to binary become:

100110100

... which is another 9 bits. Even without considering the possibility of sign bits, 58 + 9 = 67, hence too many to fit in a 64 bit (8 byte) memory location, no matter how arranged.

In contrast, in what Double.MaxValue returns for me (as above), the digits of the displayed mantissa (179769313486232) converts to binary as:

101000110111111111001110000100100110010110011000

... which is just 48 bits - which, added to the 9 for the exponent, is just 57 bits - which fits easily in 64 bits!

Kind Regards, John

19. ## Re: Precision of Double variables for positive integers

Originally Posted by JohnW2
Well, for a start, not for me - if I push that into a textbox, what I get is ...

1.79769313486232E+308
Now THAT's interesting. You're right. If I hover over Double.MaxValue, I get the value from the documentation. If I add the .ToString to put it into any string-based control (label or textbox, for instance), then I get the value you are showing. So...I'm really not quite sure what to make of that. ToString on a Double appears to be displaying a different value from what the Double itself appears to be holding. What's really peculiar about that is that when you hover over a variable, you get a tooltip, which I thought was just the .ToString return for that variable. I guess it isn't ALWAYS that, but I had a thread a few years back where I was looking at something like that.

Code:
`However, even if it were identical to what is in the documentation, I don't think that would prove much.  I'm sure that the value that Double.MaxValue returns is not determined/calculated when it is asked for - it will simply be a constant programmed into the software - so I would suggest that either an incorrect constant in software has been copied into the documentation or an incorrect figure in the documentation has been programmed into the software as a constant!`
I would not expect that to be the case. That would be a risky way to write things. Better to just set the bits and return the value.

Beyond that, it's just speculation.

20. ## Re: Precision of Double variables for positive integers

Originally Posted by Shaggy Hiker
Now THAT's interesting. You're right. If I hover over Double.MaxValue, I get the value from the documentation.
Yes, I noticed that. However, when one does that hovering, it says that it is "(constant)" and then, just in case one didn't read that, then also goes on to say "This field is constant". I think that this give a reasonable amount of support to my suggestion/suspicion that it is a hard-coded constant, rather than a value 'determined' when needed.

Originally Posted by Shaggy Hiker
If I add the .ToString to put it into any string-based control (label or textbox, for instance), then I get the value you are showing. So...I'm really not quite sure what to make of that.
Yep, and the same happens if one uses Cstr(Double.MaxValue).

Originally Posted by Shaggy Hiker
ToString on a Double appears to be displaying a different value from what the Double itself appears to be holding.
I'm not sure how you think you know "what the Double itself appears to be holding". Since, as far as I am aware, a Double cannot be converted to anything else other than a string [using ToString or Cstr()], I don't know of any other way one could see 'what it was holding'. As I've demonstrated, the one thing we do know is that a Double (or any other 64-bit format one might dream up) cannot possibly hold the value shown in the documentation (or which you see when you hover over it), since 67 bits is more than 64!

Originally Posted by Shaggy Hiker
I would not expect that to be the case. That would be a risky way to write things. Better to just set the bits and return the value.
We may have to agree to disagree about this, since, as above, I still think it most likely that it's a hard-coded constant. I don't really see that as particular risky, since there is never going to be any change in the storage format of a particular variable type, since that would presumably be catastrophic in terms of backward compatibility. Perhaps more to the point, if it were determined by 'setting the (64) bits', it could not possibly give the value shown in the documentation, or which you see when you hover, since a mantissa full of (64) 'set' bits still only corresponds to 16 decimal digits

Kind Regards, John

21. ## Re: Precision of Double variables for positive integers

Originally Posted by JohnW2
...
In an attempt to investigate this matter, interestingly, if I assign a positive integer value to a Double variable which has 18 significant figures, such as 912345678901234567,
... However, if I use CLng() to convert the value to Long, and then display that value in a textbox, I see all 18 figures of the value which what was originally assigned to the Double variable, albeit the least significant figure is incorrect in terms of what was assigned (“912345678901234560”).
...
It depends on the binary representation of your chosen number
i.e. you chose 912345678901234567

The value stored in the Double was 912345678901234560.

You have the implied bit at the front, so that isn't part of the mantissa
You have trailing 0's so those are not part of the mantissa
The least significant 1 in the mantisa represents a value of 128, so you have three decimal digits represented by a single binary digit, so overall it looks like you stored 18 decimal digits. Because the last digit,7, didn't fit in the double conversion, you ended up with a series of trailing binary 0s, which are covered by increasing the exponent.
Code:
```Implied                                                          included via exponent
| 52 bit mantissa                                                |     |
| |                                                              |     |
1 10010 10100 10100 11011 00011 11101 00111 01000 10101 00101 11 0000000
|
This 1 represents a value of 128, so that single bit gives you two extra decimal digits```
The value 1.79769313486231570E+308 is essentially a converted approximate version of the true value.

The true value in Base 10 would be raised to the power of 308.25, give or take. Since they want to express the exponent to be an even 308, the decimal version of the "mantissa" is adjusted to remove the .25 portion of the exponent, which makes the adjusted value larger than what would fit in the bits available.

Its just part of the aliasing that happens when you map powers of 2 to powers of 10. They never really match, i.e. 10 is not a power of 2, 100 is not a power of 2. No whole power of 10 is directly representable as a whole power of 2.

22. ## Re: Precision of Double variables for positive integers

Originally Posted by passel
It depends on the binary representation of your chosen number …. i.e. you chose 912345678901234567 .... The value stored in the Double was 912345678901234560.
Indeed, and I suppose I did not make life any easier for my thinking by not looking and seeing that the number I arbitrarily chose had 7 trailing zeros in its binary representation!

Are we agreed that the maximum value of a positive integer that can be stored precisely in a Double (i.e. with exponent=0) is 2^53 (1 implied plus 52 explicit mantissa bits), which is decimal is precisely 9,007,199,254,740,991 (aka 9.0...E15)?

Originally Posted by passel
The value 1.79769313486231570E+308 is essentially a converted approximate version of the true value. … The true value in Base 10 would be raised to the power of 308.25, give or take.
Indeed, but I’m still not convinced that it is particularly sensible or helpful to quote an ‘approximate version’ to a higher degree of precision than is actually attainable. Indeed, as we’ve shown, if one actually displays the value of Double.MaxValue one gets what I would call the much-more-sensible figure of

1.79769313486232E+308

… which is a true reflection of the actual precision available (and is therefore an indication of how large an integer can be stored precisely in a Double).

So, I don’t think that this alters my underlying point - that (albeit partially due to my laziness), the documentation confused/misled me for a while.

You know how this arose. In relation to TimeSpans, I was talking about the difference between Elapsed.TotalMilliseconds (Double) and Elapsed.Ticks (Long) - and was making the point that the former (a Double) could give essentially precise figures for elapsed times which were “almost as long” as is the case with the latter (a Long).

However, as said, I was lazy, in that I did not bother to work out what was the largest integer that could be stored precisely as a Double. Instead, I looked at the ‘maximum value' given in the documentation and counted the number of significant decimal digits in what it said - and (since I didn’t stop to think about the illogicality) that led me to believe and say that the difference of ‘integer capacities’ (of Double and Long) was a bit less than it actually is (albeit still a relatively small difference).

Kind Regards, John

23. ## Re: Precision of Double variables for positive integers

I should have added to what I just posted ....

Originally Posted by passel
The value 1.79769313486231570E+308 is essentially a converted approximate version of the true value.
The value of 1.79769313486232E+308 is certainly the largest that a Double can be displayed as in a control - that is what I see if I assign 1.79769313486231570E+308 to the variable.

If I start adding to that variable value, nothing changes (i.e. displayed result remains as 1.79769313486232E+308) with additions (to 1.79769313486231570E+308) of up to about 9.9792E+291. If I add more than that (to take the value beyond the variable's 'maximum value'), I had expected that I would get an 'overflow error' - but, in fact, what happens is that I get a tiny symbol, which I think is an 'infinity' sign.

In any event, the fact remains that 1.79769313486232E+308 is the largest value of a Double that one can 'display'.

Kind Regards, John

24. ## Re: Precision of Double variables for positive integers

Originally Posted by JohnW2
You know how this arose. In relation to TimeSpans, I was talking about the difference between Elapsed.TotalMilliseconds (Double) and Elapsed.Ticks (Long) - and was making the point that the former (a Double) could give essentially precise figures for elapsed times which were “almost as long” as is the case with the latter (a Long).

Kind Regards, John
I hadn't realized that. I think you may have hit on a programming blind spot, in this case. While what you said is technically true, I expect that the people who wrote the underlying code that returns ElapsedTicks. TotalMilliseconds is a Double because all the TotalX properties of the TimeSpan return Doubles, so whoever wrote that TotalMilliseconds was thinking in those terms. When it comes to ticks, though, that might be "closer to the metal". Counting ticks for timing is an ancient practice that predates the time when a floating point value could be used blithely. While you are right in what you said, I would guess that even using a Long Integer for counting ticks was resisted by whoever wrote the ElapsedTicks method as being unnatural.

I don't know that to be true, but we certainly deal with plenty of things in programming that are based on convention rather than expediency, and this may well be one of them. TotalMilliseconds was new thinking, while ElapsedTicks might be old school thinking.

25. ## Re: Precision of Double variables for positive integers

Originally Posted by Shaggy Hiker
I hadn't realized that. I think you may have hit on a programming blind spot, in this case. While what you said is technically true, I expect that the people who wrote the underlying code that returns ElapsedTicks. TotalMilliseconds is a Double because all the TotalX properties of the TimeSpan return Doubles, so whoever wrote that TotalMilliseconds was thinking in those terms.
Indeed - but, since they didn't include a TotalMicroseconds and/or TotalNanoseconds (or even Total100Nanoseconds), they really had no choice other than to have TotalMilliseconds as floating-point (hence Double), since the resolution of the timer is such that intervals can go down to 0.0001 milliseconds.

Originally Posted by Shaggy Hiker
When it comes to ticks, though, that might be "closer to the metal". Counting ticks for timing is an ancient practice that predates the time when a floating point value could be used blithely. While you are right in what you said, I would guess that even using a Long Integer for counting ticks was resisted by whoever wrote the ElapsedTicks method as being unnatural.
Both Elapsed.Ticks and ElapsedTicks (don't blame me for the confusion ) are, of course, Long. If there were a hypothetical Total100Nanoseconds, that could theoretically also be Long, in which case I imagine that it would be identical to Elapsed.Ticks.

Edit: Of course, given that TotalSeconds is Double, anything smaller (i.e. TotalMilliseconds, or hypothetical TotalMicroseconds etc.) is almost redundant, since all that would differ (from TotalSeconds) is the position of the decimal point.

Kind Regards, John

26. ## Re: Precision of Double variables for positive integers

Originally Posted by JohnW2
Edit: Of course, given that TotalSeconds is Double, anything smaller (i.e. TotalMilliseconds, or hypothetical TotalMicroseconds etc.) is almost redundant, since all that would differ (from TotalSeconds) is the position of the decimal point.

Kind Regards, John
Well, yeah, but MS is a US company, so we don't go for convenient conversions between things. If we were willing to just shift decimal places, we'd be using the metric system.

27. ## Re: Precision of Double variables for positive integers

Originally Posted by Shaggy Hiker
Well, yeah, but MS is a US company, so we don't go for convenient conversions between things. If we were willing to just shift decimal places, we'd be using the metric system.
Well, like it or not, even in the US, once one has got smaller than minutes, you are stuck with a 'metric system' of time measurement

Kind Regards, John

28. ## Re: Precision of Double variables for positive integers

Originally Posted by JohnW2
... Indeed, as we’ve shown, if one actually displays the value of Double.MaxValue one gets what I would call the much-more-sensible figure of

1.79769313486232E+308

… which is a true reflection of the actual precision available (and is therefore an indication of how large an integer can be stored precisely in a Double).
...
Well, there is a fallacy in your logic.
You're looking at the string conversion employed by some software writer and saying that is a true reflection of the actual precision available, but it is not.

When you started with your base value, and were adding to it, you didn't reach an infinity indication for quite a while, i.e. up to around 9.9792E+291 more values.
You were looking at the string converted output of the value, not the raw bits of the value, so you can't see what the true value is, only the value that some string conversion chose to convert to.

You probably didn't hover your mouse over the value to see what was displayed in the tooltip window that the IDE presents to show the value of the variable being hovered over.
You also probably didn't add the variable to the watch window, so you could see what the value of the variable is while stepping through the IDE.
If you did that, you might come away with a different impression.

The string conversion that the controls use chose a simplified approach, and that is to display the value out to 13 decimal digits of precision, so you get values that have been rounded to 13 decimal digits, when the true conversion would display up to 17 digits, depending on the actual binary value of the floating point number.

29. ## Re: Precision of Double variables for positive integers

Originally Posted by passel
You're looking at the string conversion employed by some software writer and saying that is a true reflection of the actual precision available, but it is not.
Indeed - but until I'd read the rest of your post, I hadn't thought of a way that I could see 'the true value of the variable', other than by having it converted to a string ....

Originally Posted by passel
You probably didn't hover your mouse over the value to see what was displayed in the tooltip window that the IDE presents to show the value of the variable being hovered over.
As you say, I didn't do that, but now I have, and I take your point - so maybe, if I wanted to complain about anything, it should be the string conversion?

However, having said that, we seem to be back to the fact that that 'true value' you're talking about is, in fact, a decimal approximation to the true binary value, with that approximation having an 'apparent precision' greater than that which the number of available bits could actually support - so, on reflection, I'm inclined to feel that (as I said before) the string conversion is probably producing a more realistic (sensible?) figure!

Kind Regards, John

30. ## Re: Precision of Double variables for positive integers

I've only just seen this edit ...

Originally Posted by passel
The string conversion that the controls use chose a simplified approach, and that is to display the value out to 13 decimal digits of precision, so you get values that have been rounded to 13 decimal digits, when the true conversion would display up to 17 digits, depending on the actual binary value of the floating point number.
Indeed, but do you not mean 15 (rather than 13) decimal digits (i.e. 1 to the left, and 14 to the right, of the decimal point) (and 1+17 for a 'true conversion')?

If so, as I've just written, I'm inclined to regard that as a more realistic display of the actual number (per its binary representation).

Kind Regards, John

31. ## Re: Precision of Double variables for positive integers

Originally Posted by JohnW2
...

However, having said that, we seem to be back to the fact that that 'true value' you're talking about is, in fact, a decimal approximation to the true binary value, with that approximation having an 'apparent precision' greater than that which the number of available bits could actually support - so, on reflection, I'm inclined to feel that (as I said before) the string conversion is probably producing a more realistic (sensible?) figure!

Kind Regards, John
You probably read my post before I added the last bit. {p.s. as you confirmed}

And perhaps I misspoke somewhat when I called it a decimal approximation, the value 1.79769313486231570E+308 is the true maximum numeric value the double can represent, but the decimal exponent can't be converted to a binary exponent, and the decimal mantissa can't be converted to a binary mantissa directly.
The binary representation of the float, and the decimal representation of the float represent the same number, but you have to adjust what portions of the number are assigned to the exponent and what portion is assigned to the mantissa differently when expressing the number as decimal vs binary.

The controls are arbitrarily rounding the number to 13 (sic. 15 actually) decimal places for display, rather than using the true available 14,15,16, or 17 digits (depending on the number) and I would not necessarily consider that "sensible", but expedient, and possibly acceptable depending on your needs.

There is a technical balance between the mantissa and exponent, that you didn't consider when you said the largest whole integer that can be held without loosing precision is 9007199254740991.

That would be a binary number with 53 bits set (1 implicit and 52 mantissa).
If you add 1 to that number, in binary you get 54 bits, i.e. 1 followed by 53 zeros.
Because of the implied 1 bit, and the exponent which can represent the 53 zeros, the actual maximum integer that can be represented as a continuous increment of 1 from the previous number is actually 9007199254740992.

Also, depending on how you write your code, i.e. perhaps adding 1 to a float value in a loop, if you start with the value 9007199254740991 you will see it go to 9007199254740992 but then the next increment will show in the watch window as 9007199254740992 when it is actually 9007199254740993, and then with the next increment of 1 it will go to 9007199254740994. So, even though the double variable had the value 9007199254740992 in it, the value 9007199254740993 was stored somewhere so that when the next increment of 1 happened, the "hidden" value of 9007199254740993 was incremented and the Double was set to 9007199254740994, which is a value it can hold.
Within the loop, you can continue to add 1, and the value in the watch window will increment by 2 every other pass initially, and then increase by 4 every fourth pass, and so on.

How is that possible that you can continued to increment a double by 1 in a loop past the resolution that the double can hold?

Well, the double is pushed on the floating point stack at the beginning of the loop, and 1 is pushed on the stack and they are added. The Floating Point processor works with 80-bit floating point values, so your 64-bit double is converted into an 80-bit double (the exponent is the same, the mantissa just has 16 more bits to work with).
With each addition of 1, that value on the stack is incremented by 1 (with many bits of mantissa to spare), and the upper 64-bits of the 80-bit floating point number (perhaps with rounding of the bottom bit, I don't know) is copied back to the double.
So, generally, while in a loop, or acting on expression side on the right side of an assignment, the values can accumulate and work with an extra 16-bits of precision, but when stored back to the double for permanent storage, will loose that extra precision.

Some languages and compilers, support the extended 80-bit floating point type as a variable type, .Net doesn't.

32. ## Re: Precision of Double variables for positive integers

Originally Posted by JohnW2
I've only just seen this edit ...

Indeed, but do you not mean 15 (rather than 13) decimal digits (i.e. 1 to the left, and 14 to the right, of the decimal point) (and 1+17 for a 'true conversion')?

If so, as I've just written, I'm inclined to regard that as a more realistic display of the actual number (per its binary representation).

Kind Regards, John
You're right, I managed to miscount, apparently more than once.

But, I disagree that the output should be arbitrarily limited to 15 decimal digits. The number of decimal digits that a binary double can represent can exceed 15 digits (as the watch window will show if you watch it as you add values to a large double, compared to what the string conversion displays).
The watch window is displaying the value in Base 10 to the maximum precision available {edit: actually, not. Since as explained below, some numbers could be precisely converted to base 10 with 100s of digits (e.g. sums of powers of 2), the watch window just uses a slightly higher number of digits than the string conversion does, i.e. it limits itself to 16 digits after the decimal point}, while the string conversion is limiting you to 14 digits after the decimal point arbitrarily.

Let's take a simple example.
2^64 can be represented precisely in a floating point Double with no loss of precision.
It can also be represented precisely as a decimal value, 18,446,744,073,709,551,616.

So, you have a number that can be represented in a double with no loss of precision, and express as a decimal exponent would be 1.8446744073709551616E+19

Of course with even some larger numbers the double can represent the number precisely in binary and the decimal value can also be expressed precisely, but be hundreds of digits long.

The precision is available for that given number, but you probably wouldn't want your value to be displayed with wildly varying number of digits just because it can. So, they cut if off at 14 digits after the decimal point for "reasonableness" {edit: and the watch window and IDE chose 16 digits after the decimal point for their version of reasonableness}.

But Double.MaxValue is the true MaxValue, and would display as such if the string conversion just chose 18 digits as its cutoff rather than 15.

33. ## Re: Precision of Double variables for positive integers

Originally Posted by passel
You're right, I managed to miscount, apparently more than once.
I'm reassured to see that I'm not the only one who can miscount and miscalculate

Originally Posted by passel
But, I disagree that the output should be arbitrarily limited to 15 decimal digits. The number of decimal digits that a binary double can represent can exceed 15 digits (as the watch window will show .... some numbers could be precisely converted to base 10 with 100s of digits ... The precision is available for that given number, but you probably wouldn't want your value to be displayed with wildly varying number of digits just because it can. So, they cut if off at 14 digits after the decimal point for "reasonableness" {edit: and the watch window and IDE chose 16 digits after the decimal point for their version of reasonableness}.
Even though you say above that you 'disagree' with it, that last point of yours is really what I've been getting at, even if you call it "reasonableness" whereas I suggested that it was "realistic (sensible?)". It's far from unusual (e.g. in measuring instruments), perhaps in the name of 'consistency', to restrict the precision of a 'display' to a degree which is appropriate for the entire range of displayable values, even if (as is quite common) higher resolution would be possible in some parts (often one end) of the range.

Kind Regards, John

34. ## Re: Precision of Double variables for positive integers

Originally Posted by passel
.... How is that possible that you can continued to increment a double by 1 in a loop past the resolution that the double can hold? Well, the double is pushed on the floating point stack at the beginning of the loop, and 1 is pushed on the stack and they are added. The Floating Point processor works with 80-bit floating point values, so your 64-bit double is converted into an 80-bit double (the exponent is the same, the mantissa just has 16 more bits to work with).
For what it's worth, I think that the 'Extended Double' ('Long Double') 80-bit formats I've come across have had a 64 or 65 bit mantissa and a 15 bit exponent - i.e. both mantissa and exponent have had more bits than a Double.

Kind Regards, John

35. ## Re: Precision of Double variables for positive integers

Right again. I guess I was mixing up my old VAX Floating point formats where the exponent remained the same size between lower and higher precision floating point types compared to IEEE/ Intel X87 formats.

36. ## Re: Precision of Double variables for positive integers

Originally Posted by passel
Right again. I guess I was mixing up my old VAX Floating point formats where the exponent remained the same size between lower and higher precision floating point types compared to IEEE/ Intel X87 formats.
I guess that makes sense if/when one's main concern is about precision, rather than capacity.

In the situation you mentioned, of using 80 bits for intermediate storage during calculations on 64-bit variables, I imagine that there is as much (probably more) concern about capacity as loss of precision, so I suppose it's then logical/reasonable to have more bits in both mantissa and exponent.

Kind Regards, John

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•

Featured