Decimal Arithmetic Specification, version 1.70 Copyright (c) IBM Corporation, 2009. All rights reserved. © 7 Apr 2009 [previous | contents | next]

# Appendix A – The X3.274 subset

The full specification in the body of this document defines a decimal floating-point arithmetic which gives exact results and preserves exponents where possible. If insufficient precision is available for this, then numbers are handled according to the rules of IEEE 854. The use of IEEE 854 rules implies that special values (infinities and NaNs) are allowed, as subnormal values and the value –0.

For some applications and programming languages (especially those intended for use by people who are not mathematically sophisticated), it may be appropriate to provide an arithmetic where infinite, NaN, or subnormal results are always treated as errors, –0 results are hidden, and other (largely cosmetic) changes are provided to aid acceptance of results.

The arithmetic defined in ANSI X3.274 is such an arithmetic; this appendix describes the differences between this and the full specification. Implementations which support this subset explicitly might provide the subset behavior under the control of a parameter in the context[1]  or might provide a different interface (additional or parameterized methods, for example).

#### Simplified number set

In the subset arithmetic, a reduced set of number values is supported and (where appropriate) numbers with positive exponents have their exponent reduced to zero. Specifically:
• In the to-number conversion, if the coefficient for a finite number has the value zero, then the sign and the exponent are both set to 0.
• If the coefficient in a result has the value zero, then the sign is set to 0 and (unless the operation is quantize) the exponent is set to 0.[2]
• In the to-number conversion, strings which represent special values are not permitted. (That is, only finite numbers are accepted.)
• Subnormal numbers are not permitted. If the result from a conversion or operation would be subnormal then an Underflow error results (see below).
• After any operation and the rounding of its result (unless the operation is quantize), a result with a positive exponent is converted to an integer provided that the resulting coefficient would have no more than precision digits. In other words, in this case a positive exponent is reduced to 0 by multiplying the coefficient by 10exponent (which has the effect of suffixing exponent zeros).[3]

#### Operation differences

In the subset arithmetic, operands are rounded before use if necessary (as in Numerical Turing[4]  and Rexx), the Lost digits condition is added to the context, the results of some operations are trimmed, the rounding rule after a subtraction is less conservative, and raising 0 to the power 0 is not treated as an error. Specifically:
• If the number of decimal digits in the coefficient of an operand to an operation is greater than the current precision in the context then the operand is rounded to precision significant digits using the rounding algorithm described by the context before being used in the computation. In other words, an automatic ‘convert to shorter’ is applied before the operation.
• During an add or subtract operation, if either number is zero then the other number, rounded to precision if necessary, is used as the result (with sign adjustment as appropriate).[5]
• The Lost digits condition is added to the abstract context; it should be set to 0 in default contexts.
This condition is raised when non-zero digits are discarded before an operation. This can occur when an operand which has more leading significant digits in its coefficient than the precision setting is rounded to precision digits before use
Note that the lost digits test does not treat trailing decimal zeros in the coefficient as significant. For example, if precision had the value 5, then the operands
```  [0,12345,-5]
[0,12345,-2]
[0,12345,0]
[1,12345,0]
[0,123450000,-4]
[0,1234500000,0]
```
would not cause an exception (whereas [0,123451,-1] or [0,1234500001,0] would).
• After a divide or power operation is complete and the result has been rounded, any insignificant trailing zeros are removed. That is, if the exponent is not zero and the coefficient is a multiple of a positive power of ten then the coefficient is divided by that power of ten and the exponent increased accordingly. If the exponent was negative it will not be increased above zero.
• After an addition operation, the result is rounded to precision digits if necessary, taking into account any extra (carry) digit on the left after an addition, but otherwise counting from the position corresponding to the most significant digit of the operands being added or subtracted (rather than the most significant digit of the result).
• For the max and min operations, the first (left-hand) operand is chosen if the operands are numerically equal.
• If both operands to a power operation are zero then the result is 1 (instead of being an error); however, if the left-hand operand is zero the right-hand operand must not be negative.
• The right-hand operand to a power operation may be an integer, and subset implementations are only required to provide the power function for integer powers. In this case the algorithm described below may be used for calculating the result.
• The fused-multiply-add operation is not defined for subset implementations, because the rounding of operands rule conflicts with the requirement for fused-multiply-add to deliver a result with only one rounding.

#### Exceptional condition and rounding mode rules

In the subset arithmetic, exceptional conditions other than the informational conditions (Lost digits, Inexact, Rounded, and Subnormal) must be treated as errors, and results after these errors are undefined. Special values and subnormal numbers, therefore, are not part of the arithmetic.

In the subset, only the Lost digits trap enabler is required. Inexact, Rounded, and Subnormal trap enablers are optional, and the others are (in effect) always set. Similarly, the status bits in the context are optional.

Only the round-half-up rounding mode is required.

#### Calculating an integer power

Subset implementations are only required to provide the power function for integer powers. To calculate this, the number (left-hand operand) is in theory multiplied by itself for the number of times expressed by the power. If the right-hand operand is negative, the left-hand operand is used as-is, the absolute value of the right-hand operand is used, and the final result is inverted.[6]

In practice (see the note below for the reasons), the power is often calculated by the process of left-to-right binary reduction. For power(x, n): ‘n’ is converted to binary, and a temporary accumulator is set to 1. If ‘n’ has the value 0 then the initial calculation is complete. Otherwise each bit (starting at the first non-zero bit) is inspected from left to right. If the current bit is 1 then the accumulator is multiplied by ‘x’. If all bits have now been inspected then the initial calculation is complete, otherwise the accumulator is squared by multiplication and the next bit is inspected.

The multiplications and any final division are done under the normal arithmetic operation rules, using the precision supplied for the operation, except that the multiplications (and the division, if needed) are carried out using an increased precision of precision+elength+1 digits. Here, elength is the length in decimal digits of the integer part (coefficient) of the whole number ‘n’ (i.e., excluding any sign, decimal part, decimal point, or insignificant leading zeros.[7]

If, when raising to a negative power, an underflow occurs during the division into 1, the operation is not halted at that point but continues.[8]

Notes:

1. A particular algorithm for calculating integer powers is described, since it is efficient (though not optimal) and considerably reduces the number of actual multiplications performed. It therefore gives better performance than the simpler definition of repeated multiplication. Since results can occasionally differ from those of repeated multiplication, the algorithm is defined here so that different implementations which use it will give identical results for the power operation on the same values, and may therefore use the same testcases. Other algorithms for the power operation may also be used, so long as the result is within 1 ulp (unit in last place) of the correct result.
2. Implementations are encouraged to provide a power operator which will accept a non-integral right-hand operand when the left-hand operand is non-negative, as described in the body of this specification.

Footnotes:
 [1] The decNumber package, for example, provides the subset behavior if the extended bit is set to 0. [2] This rule, together with the to-number definition, ensures that numbers with values such as -0 or 0.0000 will not result from general operations in the subset arithmetic. This allows a concrete representation for the subset to comprise simply two integers in two’s complement form. [3] The underlying intent here is that positive exponents in the operands are reduced to zero before the operation, so that all operations take place on numbers that could be expressed as ‘plain’ decimal numbers with no exponent. The rule is expressed as a constraint on the result because it is often more convenient or efficient to implement it in this way. The rule also preserves integers as specified by ANSI X3.274, and in particular ensures that the results of the divide and divide-integer operations are identical when the result is an exact integer. [4] See: T. E. Hull, A. Abrham, M. S. Cohen, A. F. X. Curley, C. B. Hall, D. A. Penny, and J. T. M. Sawchuk, Numerical Turing, SIGNUM Newsletter, vol. 20 #3, pp26-34, ACM, May 1985. [5] In the subset arithmetic, zeros have no exponent. [6] This rule is slightly more complicated than inverting before the calculation, in that it requires special treatment of overflow and underflow conditions (which were not an issue in X3.274). [7] The precision specified for the intermediate calculations ensures that the final result will differ by at most 1, in the least significant position, from the ‘true’ result (given that the operands are expressed precisely under the current setting of digits). Half of this maximum error comes from the intermediate calculation, and half from the final rounding. [8] It can only be halted early if the result becomes zero.

[previous | contents | next]