Decimal Arithmetic Specification, version 1.70
Copyright (c) IBM Corporation, 2009. All rights reserved. ©
7 Apr 2009
[previous | contents | next]

Appendix A – The X3.274 subset

The full specification in the body of this document defines a decimal floating-point arithmetic which gives exact results and preserves exponents where possible. If insufficient precision is available for this, then numbers are handled according to the rules of IEEE 854. The use of IEEE 854 rules implies that special values (infinities and NaNs) are allowed, as subnormal values and the value –0.

For some applications and programming languages (especially those intended for use by people who are not mathematically sophisticated), it may be appropriate to provide an arithmetic where infinite, NaN, or subnormal results are always treated as errors, –0 results are hidden, and other (largely cosmetic) changes are provided to aid acceptance of results.

The arithmetic defined in ANSI X3.274 is such an arithmetic; this appendix describes the differences between this and the full specification. Implementations which support this subset explicitly might provide the subset behavior under the control of a parameter in the context[1]  or might provide a different interface (additional or parameterized methods, for example).

Simplified number set

In the subset arithmetic, a reduced set of number values is supported and (where appropriate) numbers with positive exponents have their exponent reduced to zero. Specifically:

Operation differences

In the subset arithmetic, operands are rounded before use if necessary (as in Numerical Turing[4]  and Rexx), the Lost digits condition is added to the context, the results of some operations are trimmed, the rounding rule after a subtraction is less conservative, and raising 0 to the power 0 is not treated as an error. Specifically:

Exceptional condition and rounding mode rules

In the subset arithmetic, exceptional conditions other than the informational conditions (Lost digits, Inexact, Rounded, and Subnormal) must be treated as errors, and results after these errors are undefined. Special values and subnormal numbers, therefore, are not part of the arithmetic.

In the subset, only the Lost digits trap enabler is required. Inexact, Rounded, and Subnormal trap enablers are optional, and the others are (in effect) always set. Similarly, the status bits in the context are optional.

Only the round-half-up rounding mode is required.

Calculating an integer power

Subset implementations are only required to provide the power function for integer powers. To calculate this, the number (left-hand operand) is in theory multiplied by itself for the number of times expressed by the power. If the right-hand operand is negative, the left-hand operand is used as-is, the absolute value of the right-hand operand is used, and the final result is inverted.[6] 

In practice (see the note below for the reasons), the power is often calculated by the process of left-to-right binary reduction. For power(x, n): ‘n’ is converted to binary, and a temporary accumulator is set to 1. If ‘n’ has the value 0 then the initial calculation is complete. Otherwise each bit (starting at the first non-zero bit) is inspected from left to right. If the current bit is 1 then the accumulator is multiplied by ‘x’. If all bits have now been inspected then the initial calculation is complete, otherwise the accumulator is squared by multiplication and the next bit is inspected.

The multiplications and any final division are done under the normal arithmetic operation rules, using the precision supplied for the operation, except that the multiplications (and the division, if needed) are carried out using an increased precision of precision+elength+1 digits. Here, elength is the length in decimal digits of the integer part (coefficient) of the whole number ‘n’ (i.e., excluding any sign, decimal part, decimal point, or insignificant leading zeros.[7] 

If, when raising to a negative power, an underflow occurs during the division into 1, the operation is not halted at that point but continues.[8] 

Notes:

  1. A particular algorithm for calculating integer powers is described, since it is efficient (though not optimal) and considerably reduces the number of actual multiplications performed. It therefore gives better performance than the simpler definition of repeated multiplication. Since results can occasionally differ from those of repeated multiplication, the algorithm is defined here so that different implementations which use it will give identical results for the power operation on the same values, and may therefore use the same testcases. Other algorithms for the power operation may also be used, so long as the result is within 1 ulp (unit in last place) of the correct result.
  2. Implementations are encouraged to provide a power operator which will accept a non-integral right-hand operand when the left-hand operand is non-negative, as described in the body of this specification.

Footnotes:
[1] The decNumber package, for example, provides the subset behavior if the extended bit is set to 0.
[2] This rule, together with the to-number definition, ensures that numbers with values such as -0 or 0.0000 will not result from general operations in the subset arithmetic. This allows a concrete representation for the subset to comprise simply two integers in two’s complement form.
[3] The underlying intent here is that positive exponents in the operands are reduced to zero before the operation, so that all operations take place on numbers that could be expressed as ‘plain’ decimal numbers with no exponent. The rule is expressed as a constraint on the result because it is often more convenient or efficient to implement it in this way. The rule also preserves integers as specified by ANSI X3.274, and in particular ensures that the results of the divide and divide-integer operations are identical when the result is an exact integer.
[4] See: T. E. Hull, A. Abrham, M. S. Cohen, A. F. X. Curley, C. B. Hall, D. A. Penny, and J. T. M. Sawchuk, Numerical Turing, SIGNUM Newsletter, vol. 20 #3, pp26-34, ACM, May 1985.
[5] In the subset arithmetic, zeros have no exponent.
[6] This rule is slightly more complicated than inverting before the calculation, in that it requires special treatment of overflow and underflow conditions (which were not an issue in X3.274).
[7] The precision specified for the intermediate calculations ensures that the final result will differ by at most 1, in the least significant position, from the ‘true’ result (given that the operands are expressed precisely under the current setting of digits). Half of this maximum error comes from the intermediate calculation, and half from the final rounding.
[8] It can only be halted early if the result becomes zero.

[previous | contents | next]