Thursday, October 15, 2009

the surprisingly interesting story of numerical rounding

Binary floating-point math is inherently flawed. 0.110 is the same as 0.0001100110011001100110011001100110011001100110011...2, which is truncated to 53 bits (the size of a typical floating-point register), which then translates back as 0.100000000000000005551115123125782702118158340454101562510. It's the same problem as trying to represent one-third in base ten; any limit to the number of threes after the decimal makes it fundamentally a different number. (Note that in base 3, one-third is very cleanly represented as 0.13.)

You may then think, "forget that! I'll do floating-point with text strings so that I have perfect accuracy!" But even text strings are limited -- consider the aforementioned representation of one-third, which would have an infinite number of threes. Even adding a mechanism to designate repeating digits will not handle irrational numbers such as sqrt(2).

So we're stuck in an imperfect world. The best anyone can do is try to recapture the original value by rounding. There are many algorithms for rounding, despite what you remember from elementary school math. The one everyone knows is called "round half away from zero". That is, half (0.5) is always rounded up on positive numbers and down on negative numbers. 1.5 becomes 2, 2.5 becomes 3, and -3.5 becomes -4. However, the accounting world uses a different one called "round half up", where half (0.5) is always rounded up regardless if it's positive or negative.

Both of these introduce bias, which is math-speak for "long-term error". Imagine a long set of uniformly distributed random fractional numbers -- all the fractional parts under 0.5 that are rounded down will, in the long run, be balanced out by all the fractional parts above 0.5 that are rounded up, so there's zero long-term error there. But, with "round half up", every occurrence of 0.5 in the number set introduces a rounding error of 0.5 that is not balanced out by anything else. With "round half away from zero", that problem is addressed, but only if the numbers are half positive and half negative.

The gold standard for rounding in practical computer science is called "round half to even" (or sometimes "banker's rounding", among other colorful names). The idea is to round 0.5 to the closest even number, e.g. 5.5 and 6.5 both round to 6. This addresses the bias problem with separate sets of positive numbers and negative numbers, but only if the range of numbers is itself an even number. That is, applying this to numbers between 0 and 2 works great (0.5 rounding down is balanced by 1.5 rounding up), but not to numbers between 0 and 3 (because 2.5 rounds down and there's no corresponding half-point that rounds up).

No comments:

Post a Comment