Decimal Arithmetic Specification, version 1.70
Copyright (c) IBM Corporation, 2009. All rights reserved. © | 7 Apr 2009 |
[previous | contents | next] |
There are three components to the model:
This specification defines these components in the abstract. It neither defines the way in which operations are expressed (which might vary depending on the computer language or other interface being used),^{[1]} nor does it define the concrete representation (specific layout in storage, or in a processor’s register, for example) of numbers or context.
The remainder of this section describes the abstract model for each component.
The quantum of a finite number is given by: 1 × 10^{exponent}. This is the value of a unit in the least significant position of the coefficient of a finite number.^{[5]}
This abstract definition deliberately allows for multiple representations of values which are numerically equal but are visually distinct (such as 1 and 1.00). However, there is a one-to-one mapping between the abstract representation and the result of the primary conversion to string using to-scientific-string on that abstract representation. In other words, if one number has a different abstract representation to another, then the primary string conversion will also be different.
Notes:
All special values may have a sign, as for finite numbers. The sign of an infinity is significant (that is, it is possible to have both positive and negative infinity), and the sign of a NaN has no meaning, although it may be considered part of the diagnostic information.
For a subnormal result, the minimum value of the exponent becomes E_{min}–(precision–1), called E_{tiny}, where precision is the working precision, as described below. The result will be rounded, if necessary, to ensure that the exponent is no smaller than E_{tiny}. If, during this rounding, the result becomes inexact, then the Underflow condition is raised. A subnormal result does not necessarily raise Underflow, therefore, but is always indicated by the Subnormal condition (even if, after rounding, its value is 0 or ten to the power of E_{min}).
When a number underflows to zero during a calculation, its exponent will be E_{tiny}. The maximum value of the exponent is unaffected.
Note that the minimum value of the exponent for subnormal numbers is the same as the minimum value of exponent which can arise during operations which do not result in subnormal numbers, which occurs in the case where clength = precision.
Similarly, pairs or triads are used to indicate the special values. These have the form [sign, special-value] or the form [sign, special-value, diagnostic], where the sign is indicated as before, and the special-value is one of inf, qNaN, or sNaN, representing infinity, quiet NaN, or signaling NaN, respectively, and diagnostic is a positive integer.
So, for example, the triad [0,2708,-2] represents the number 27.08, the triad [1,1953,0] represents the integer -1953, the pair [1,inf] represents the number –¥, and the pair [0,qNaN] represents a quiet NaN.
For example, in a object-oriented language, the addition operation might be effected by a method called add, whereas in a calculator application it might be effected by clicking on a button icon. In other uses, an infix ‘+’ symbol might be used to indicate addition. And in all cases, the operation might be carried out in software, hardware, or some combination of these.
Similarly, operations which are distinct in the specification need not be mapped one-to-one to distinct operations in the implementation – it is only necessary that all the core operations are available. For example, conversions to a string could be handled by a single method, with variations determined from context or additional arguments.
The context is defined by the following parameters:
An integer which must be positive (greater than 0). This sets the maximum number of significant digits that can result from an arithmetic operation.
In the abstract, there is no upper bound on the precision (although a specific precision must always be provided). In practice there may need to be some upper limit to it (for example, the length of the maximum coefficient supported by a concrete representation). This limit must be expressed as an integral number of decimal digits.
Similarly, there may be a lower bound on the setting on precision, which may be the same as the upper bound (for example, if it is implied by the length of the maximum coefficient supported by a concrete representation). This limit must also be expressed as an integral number of decimal digits.
A named value which indicates the algorithm to be used when rounding is necessary. Rounding is applied when a result coefficient has more significant digits than the value of precision; in this case the result coefficient is shortened to precision digits and may then be incremented by one (which may require a further shortening), depending on the rounding algorithm selected and the remaining digits of the original coefficient. The exponent is adjusted to compensate for any shortening.
The five following rounding algorithms are defined and must be supported:^{[9]}
(Round toward 0; truncate.) The discarded digits are ignored; the result is unchanged.
If the discarded digits represent greater than or equal to half (0.5) of the value of a one in the next left position then the result coefficient should be incremented by 1 (rounded up). Otherwise the discarded digits are ignored.
If the discarded digits represent greater than half (0.5) the value of a one in the next left position then the result coefficient should be incremented by 1 (rounded up). If they represent less than half, then the result coefficient is not adjusted (that is, the discarded digits are ignored).
Otherwise (they represent exactly half) the result coefficient is unaltered if its rightmost digit is even, or incremented by 1 (rounded up) if its rightmost digit is odd (to make an even digit).
(Round toward +¥.) If all of the discarded digits are zero or if the sign is 1 the result is unchanged. Otherwise, the result coefficient should be incremented by 1 (rounded up).
(Round toward –¥.) If all of the discarded digits are zero or if the sign is 0 the result is unchanged. Otherwise, the sign is 1 and the result coefficient should be incremented by 1.
Three further rounding algorithms are defined; these are optional:
If the discarded digits represent greater than half (0.5) of the value of a one in the next left position then the result coefficient should be incremented by 1 (rounded up). Otherwise (the discarded digits are 0.5 or less) the discarded digits are ignored.
(Round away from 0.) If all of the discarded digits are zero the result is unchanged. Otherwise, the result coefficient should be incremented by 1 (rounded up).
(Round zero or five away from 0.) The same as round-up, except that rounding up only occurs if the digit to be rounded up is 0 or 5, and after overflow the result is the same as for round-down.^{[10]}
The exceptional conditions are grouped into signals, which can be controlled individually. The context contains a flag (which is either 0 or 1) and a trap-enabler (which also is either 0 or 1) for each signal.
For each of the signals, the corresponding flag is set to 1 when the signal occurs. It is only reset to 0 by explicit user action.
For each of the signals, the corresponding trap-enabler indicates which action is to be taken when the signal occurs (see IEEE 754 §7). If 0, a defined result is supplied, and execution continues (for example, an overflow is perhaps converted to a positive or negative infinity). If 1, then execution of the operation is ended or paused and control passes to a ‘trap handler’, which will have access to the defined result.
The signals are:
raised when the exponent of a result has been altered or constrained in order to fit the constraints of a specific concrete representation
raised when a non-zero dividend is divided by zero
raised when a result is not exact (one or more non-zero coefficient digits were discarded during rounding)
raised when a result would be undefined or impossible
raised when the exponent of a result is too large to be represented
raised when a result has been rounded (that is, some zero or non-zero coefficient digits were discarded)
raised when a result is subnormal (its adjusted exponent is less than E_{min}), before any rounding
raised when a result is both subnormal and inexact.
This specification does not define the means by which flags and traps are reset or altered, respectively, or the means by which traps are effected.^{[11]}
The context might also specify further variables, such as E_{max} where a variable exponent bound is required.
Notes:
[1] | Indeed, some variations of operations could be selected by using context settings outside the scope of this specification. |
[2] | That is, the maximum value of the coefficient will be an integral power of ten, less one – for example, 99999999999999999999. |
[3] | See IEEE 854 §3.1. |
[4] | This rule, a requirement for both ANSI X3.274 and IEEE 854, constrains the number of values which would overflow or underflow when inverted (divided into 1). |
[5] | This is slightly different from an ulp (unit in last position), which is defined such that ulp(x) is the difference between the two nearest bracketing representable values to x, and which if x is exactly representable and is an exact power of the base gives the ‘ulp below’. |
[6] | Typically, in a concrete representation, certain out-of-range values of the exponent are used to indicate the special values, and the coefficient is used to carry additional diagnostic information for quiet NaNs. In the case of the proposed IEEE 754 decimal formats, the exponent is 0, the coefficient (excluding the first digit) may hold a decimal value which is the diagnostic information, and the special value is indicated by the combination field and exponent continuation bits. |
[7] | This restriction allows the abstract coefficient in IEEE 754 encodings to be used to hold the diagnostic information for a NaN. |
[8] | That is, numbers whose absolute value is non-zero and is closer to zero than ten to the power of E_{min}. |
[9] | The term ‘round to nearest’ is not used because it is ambiguous. round-half-up is the usual round-to-nearest algorithm used in European countries, in international financial dealings, and in the USA for tax calculations. round-half-even is often used for other applications in the USA, where it is usually called ‘round to nearest’ and is sometimes called ‘banker’s rounding’. |
[10] | The rounding mode round-05up permits arithmetic at shorter lengths to be emulated in a fixed-precision environment without double rounding. For example, a multiplication at a precision of 9 can be effected by carrying out the multiplication at (say) 16 digits using round-05up and then rounding to the required length using the desired rounding algorithm. |
[11] | IEEE 754 suggests that there be a mechanism allowing traps to return a substitute result to the operation that raised the exception, but this may not be possible in some environments (including some object-oriented environments). |