Decimal Arithmetic Specification, version 1.70
Copyright (c) IBM Corporation, 2009. All rights reserved. ©
7 Apr 2009
[previous | contents | next]

Appendix B – Design concepts

This appendix summarizes the concepts underlying the arithmetic described in this document, as background information. It is not part of the specification.

The decimal arithmetic specified in this document is based on a floating-point model which was designed with people in mind, and necessarily has a paramount guiding principle – computers must provide an arithmetic that works in the same way as the arithmetic that people learn at school.[1] 

Many people are unaware that the algorithms taught for ‘manual’ decimal arithmetic are quite different in different countries, but fortunately (and not surprisingly) the end results differ only in details of rounding and presentation. The particular model chosen was based on an extensive study of decimal arithmetic and was then evolved over several years (1979–1982) in response to feedback from thousands of users in more than forty countries. Numerous implementations have been written since 1982, and minor refinements to the definition were made during the process of ANSI standardization (1991–1996).[2] 

This base floating-point model has proved suitable for extension to meet the additional requirements and facilities defined in ANSI/IEEE 854-1987,[3]  and the full specification is, in effect, the union of the floating-point specifications of the two standards. This means that the same number system and arithmetic supports, without prejudice, both exact unrounded decimal arithmetic (sometimes called ‘fixed-point’ arithmetic) and rounded floating-point arithmetic. The latter includes the facilities and number values which are now widespread in binary floating-point implementations.

Fundamental concepts

When people carry out arithmetic operations, such as adding or multiplying two numbers together, they commonly use decimal arithmetic where the decimal point ‘floats’ as required, and the result that they eventually write down depends on three factors:
  1. the specific operation carried out
  2. the explicit information in the operand or operands to the operation
  3. the information from the implied context in which the calculation is carried out (the precision required, etc.).

The information explicit in the written representation of an operand is more than that conventionally encoded for floating-point arithmetic. Specifically, there is:

The length of the numeric and original position of the decimal point are not encoded in traditional floating-point representations, such as ANSI/IEEE 754-1985,[4]  yet they are essential information if the expected result is to be obtained.

For example, people expect trailing zeros to be indicated conventionally in a result: the sum 1.57 + 2.03 is expected to result in 3.60, not 3.6; however, if the positional information has been lost during the operation it is no longer possible to show the expected result. For some applications the loss of trailing zeros is materially significant.

Fortunately, the later standard ANSI/IEEE 854-1987, which is intended for decimal as well as binary floating-point arithmetic, does not proscribe representations which do preserve the desired information. A suitable internal representation for decimal numbers therefore comprises a sign, an integer (called the coefficient in this document), and an exponent (which is an integral power of ten).

Similarly, decimal arithmetic in a scientific or engineering context is based on a floating-point model, not a fixed-point or fixed-scale model (indeed, this is the original basis for the concepts behind binary floating-point). Fixed-point decimal arithmetic packages such as the BigDecimal class in Java 1.1 are therefore only useful for a subset of the problems for which arithmetic is used.

The information contained in the context of a calculation is also important. It usually applies to an entire sequence of operations, rather than to a single operation, and is not associated with individual operands. In practice, sensible defaults can be assumed, though provision for user control is necessary for many applications.

The most important contextual information is the desired precision for the calculation. This can range from rather small values (such as six digits) through very large values (hundreds or thousands of digits) for certain problems in Mathematics and Physics. Some decimal arithmetics (for example, the decimal arithmetic in the Atari Operating System) offer just one or two alternatives for precision – in some cases, for apparently arbitrary reasons. Again, this does not match the user model of decimal arithmetic; one designed for people to use must provide a wide range of available precisions.

This specification provides for user selection of precision; the representation (especially if it is to conform to the IEEE 854-1987 standard referred to above) may have a fixed maximum precision, but up to the maximum allowed by the representation the precision used for operations may be chosen by the programmer.

The provision of context for arithmetic operations is therefore a necessary precondition if the desired results are to be achieved, just as a ‘locale’ is needed for operations involving text.

This specification provides for explicit control over several aspects of the context, including: the required precision (the point at which rounding is applied), the rounding algorithm to be used when digits have to be discarded, the range of normal numbers (which determines the bounds for overflow and underflow), and finally a set of flags and trap-enablers which report exceptional conditional and control how they are handled.

[1] For more discussion on why this is important, see the Frequently Asked Questions about decimal arithmetic at
[2] See ANSI standard X3.274-1996: American National Standard for Information Technology – Programming Language REXX, X3.274-1996, American National Standards Institute, New York, 1996.
[3] ANSI/IEEE 854-1987 – IEEE Standard for Radix-Independent Floating-Point Arithmetic, The Institute of Electrical and Electronics Engineers, Inc., New York, 1987.
[4] ANSI/IEEE 754-1985 – IEEE Standard for Binary Floating-Point Arithmetic, The Institute of Electrical and Electronics Engineers, Inc., New York, 1985.

[previous | contents | next]