Decimal Arithmetic

Decimal Arithmetic FAQ
Part 5 – Encoding Questions

Contents [back to FAQ contents]

There are many ways of encoding numbers, but here we'll only discuss decimal numbers which are encoded in a series of contiguous bytes (like binary floats and doubles) and which are described by a pair of parameters: a coefficient which is multiplied by ten raised to the power of an exponent. For information on other forms of decimal encodings, see “How are the parts of decimal numbers encoded?”.

The value of a number encoded with these two parameters is coefficient × 10^exponent. For example, if the coefficient is 9 and the exponent is 3 then the value of the number is 9000 (9×10³), and if the exponent were -2 then the value would be 0.09 (9×10^-2).

(For simplicity, only positive numbers and zero will be described in this answer. Assume that for any of these numbers there is a corresponding negative number, indicated by a separate sign bit.)

In a given encoding of decimal numbers, each of these parameters will have a ‘hard limit’:

P_limit, the maximum precision of the coefficient. This is the maximum length of the coefficient, in digits. Any result from an operation which needs more digits than this will be rounded to fit. If this rounding caused non-zero digits to be removed, the result is Inexact.
E_limit, the maximum encoded exponent. The encoded exponent is a non-negative number, in the range 0 through E_limit, from which the exponent parameter is calculated by subtracting a bias. (This use of a bias makes it easier to compare exponents in a hardware implementation.)

These limits are usually determined by some external factor (often the size of a hardware register). For this discussion, suppose P_limit=7 and E_limit=191 (we'll use these limits for all the examples below – they are conveniently small and correspond to the actual limits in the IEEE 754-2008 decimal 32-bit format).

Within these limits, there is some flexibility in the way numbers can be encoded. We can choose to treat the digits of the coefficient as an integer (in the range 0 through 9999999) or we can apply a scale, which is a constant power of ten by which such a coefficient is divided. (For example, if the scale were 6, the value of the coefficient would be in the range 0 through 9.999999.) Similarly, the bias can be varied to change the range of the exponent (for example, if the bias were 90 then the exponent could take the values -90 through +101).

These two parameters (scale and bias) are related. A given encoding (for example a coefficient encoded as 9999999 and an exponent encoded as 90) will have the same value when the scale is 6 and the bias is 90 as when the scale is 0 and the bias is 84. In the first case, the value of the coefficient is 9.999999 and the exponent is 0, and in the second case, the value of the coefficient is 9999999 and the exponent is -6.

In fact, the choice of scale is arbitrary: for a given scale, we can adjust the bias by the scale so the value of any particular bit pattern (encoding) is unchanged. We can therefore simplify this discussion by choosing a particular scale and then just consider the bias.

For many reasons, decimal numbers are usually described by a pair of integers, and therefore it proves convenient to consider the coefficient of decimal numbers as having a scale of 0 (so the coefficient is an integer). We'll use this value for the other questions in this FAQ. (Bear in mind that for describing the encoding we could equally well have chosen a scale of 6, similar to the traditional way of describing binary floating-point numbers, without affecting the remainder of this discussion other than the need to subtract 6 from the bias.)

How is the exponent bias chosen?

(This answer assumes you have already read the answer to “How are decimal numbers encoded?” where the terminology and examples are explained.)

The choice of bias for a given encoding is largely constrained by the rules of IEEE 854 and the revised IEEE 754-2008. These rules place two requirements on the set of values which must be representable:

A balanced range of exponents is defined by the parameters, E_max and E_min, which determine the overflow threshold (10×10^E_max) and the underflow threshold (1×10^E_min) respectively.
An encoding must be able to represent all possible values with precision up to P_limit whose value is lower than the overflow threshold and greater than or equal to the underflow threshold. These values are called the normal numbers. In our example format, the range of normal numbers is 1×10^E_min through 9.999999×10^E_max.
In addition to the normal numbers, it must also be possible to encode a further range of numbers of lower precision which are smaller than the underflow precision. These numbers are called subnormal numbers. The smallest subnormal number must be 1×10^E_tiny, where E_tiny is given by E_min - (P_limit - 1). In our example format, the range of subnormal numbers is 1×10^E_tiny through 999999×10^E_tiny (which is 0.000001×10^E_min through 0.999999×10^E_min).

Given these requirements, it would seem that we can now determine E_max and E_min, and hence the bias, given P_limit and E_limit. However, IEEE 854 allows a choice to be made which affects this calculation: the encoding may be redundant if desired:

In a redundant encoding, more than one coefficient (with an appropriate exponent) can be used to represent a given numerical value (for example, the underflow threshold could be represented as either 1000000×10^E_tiny or 1×10^E_min).
In a non-redundant encoding, only one coefficient is used for a given numerical value. (The coefficient chosen usually depends on the scale – when the scale is 0 the smallest coefficient is preferred).

For decimal arithmetic, intended as a tool for human use, the choice here is dictated by the need to mirror manual calculations and other human conventions (see “Why are the encodings unnormalized?”). A non-redundant encoding is inadequate for many applications, so a redundant (dual-integer) encoding is the norm.

This makes a difference at the top of the range of numbers (at the bottom of the range, the subnormal numbers cover the values for which redundant encodings can occur). For example, in the sample format, the largest normal number is 9.999999×10^E_max, which is represented by a coefficient of 9999999 and an exponent of E_max-6. However, the number 9.999990×10^E_max is in the range of normal numbers, and this could be represented by either a coefficient of 9999990 and an exponent of E_max-6 or a coefficient of 999999 and an exponent of E_max-5.

When multiplying numbers using this form of representation, the result coefficient is simply the product of the operand coefficients and the result exponent is the sum of the operand exponents (2E+5 × 3E+7 gives 6E+12). Hence, either encoding for 9.999990×10^E_max can arise by multiplying an appropriate pair of smaller normal numbers together.

Note that the second encoding shown for 9.999990×10^E_max has a larger exponent than the exponent of the largest normal number (E_max-6), and in fact if all the redundant encodings which use up to P_limit digits are allowed, the largest exponent used in a representation will be E_max. (For example, the number 9.000000×10^E_max encoded with a coefficient of 9 and an exponent of E_max.)

It might seem that we can avoid using these larger exponents by converting any such result to a value with a larger coefficient (for example, encoding the number 9.000000×10^E_max with a coefficient of 9000000 and an exponent of E_max-6). This process is called clamping.

However, if we do this the result of a multiplication could differ depending on the encoding used, even though the value of the result is in the range of normal numbers and there is no overflow. If the same calculation were carried out in a format which had a greater exponent range, those results which had their exponent reduced (normalized) in the restricted format would not be normalized in the larger format: they would have a different coefficient and exponent.

Similarly, if the same calculation were carried out by hand, or by using existing computer decimal arithmetic (such as in Java, C#, or Rexx), we would not get the normalized result. This normalization would be an artifact which only appeared near a physical format (encoding) boundary.

This disadvantage, it was decided in committee, is outweighed by the the wider exponent range achieved (and the avoidance of invalid ‘supernormal’ numbers), and hence clamping is assumed.

Therefore, the largest exponent needed is E_max-6, and the smallest exponent needed is E_min-6. Providing a range of exponents bounded by these values allows us to meet all the requirements of IEEE 854 and decimal arithmetic.

From these two figures, we can easily calculate the E_max for a given E_limit. In our example format there must be 2 × E_max exponent values (the -E_min+6 negative values, the E_max-6 positive values, and 0). As E_limit is 191, there are 192 values available and so E_max must be +96. (In general, E_max = (E_limit+1) ÷ 2, and E_min = -(E_max - 1).)

The value of the bias follows directly from the value of E_min. The smallest exponent (E_tiny) must be -101, and this will be encoded as 0. The bias is therefore 101. (In general: bias = -Emin + P_limit - 1.)

How are the parts of decimal numbers encoded?

In the early days of electronic computers, many computers were decimal (some even used decimal numbers for addressing storage), and a great variety of both fixed-point and floating-point decimal encodings were used.

Over the years, most of these encodings were abandoned, but the form of decimal encoding that has endured (because of its practicality and usefulness) is the dual-integer encoding. Dual-integer encodings describe a decimal number using two integers: a coefficient and an exponent (often called a scale, which is a negative exponent). The value of such an number is

coefficient × 10^exponent

(For example, if the coefficient were 123 and the exponent -2, the value of the number is 1.23.)

These two integers can be encoded in various ways. The exponent is almost always encoded as a small binary integer (up to 32 bits). The coefficient is generally one of three forms:

Binary Coded Decimal (BCD). Here the coefficient is encoded as a series of decimal digits, each encoded in four bits with weights 8-4-2-1. The sign is either held separately or as a 4-bit code outside the range 0-9 which follows the digits of the coefficient.
BCD is less efficient in space than a binary integer encoding, but it is much easier to convert a decimal number in this form to and from a character string representation. Rounding after any operation, and alignment before an addition or subtraction, are simplest in this form.
Binary. Here the coefficient is encoded as a ‘pure binary’ number. This is more efficient in space than BCD, and faster for multiplications, but alignment, rounding, conversions, and any operations that deal with the digits in a number (such as calculating the check digit for a credit card or bar code) are slower. In the case of rounding, two multiplies and several other operations are usually needed, which is a punitive overhead.
Base 100 derivations. Some database storage schemes store two decimal digits in a byte, using the values 0-99 in each byte, with various conventions for handling negative numbers. These schemes provide most of the advantages of BCD, but tend to be more complex.

The IEEE 754 decimal encodings for decimal numbers are also dual-integer in form, but use a compressed form of BCD (Densely Packed Decimal) which allows a higher precision decimal number in a given size. For example, a 64-bit encoded number can hold a 16-digit coefficient with a maximum normal exponent (E_max) of +384. In contrast, if BCD were used for the exponent the coefficient would be 13 digits, with a reduced maximum exponent of +64.

The Densely Packed Decimal encoding can be expanded to (or compressed from) BCD very rapidly (using table lookup in software or very simple logic with 2–3 gate delays in hardware). This means that the advantages of having a BCD encoding are preserved while allowing more precision and range in calculations.

How are zeros encoded?

Any number whose coefficient is zero has the value zero. There are therefore many redundant encodings of zero, and the exponents of the operands of a calculation are preserved when the answer is zero in the same way as they are when the result is non-zero.

For details, see “How are zeros with exponents handled?”.

Why are the encodings unnormalized?

For decimal arithmetic, intended as a tool for human use, the choice of an unnormalized encoding is dictated by the need to mirror manual calculations and other human conventions. A normalized (non-redundant) encoding is ideal for purely mathematical calculations, but is inadequate for many other applications. Notably:

Unnormalized encodings are used in existing languages, databases, and applications.
Decimal arithmetic in computing almost invariably uses a scaled integer representation. For example, the languages COBOL, PL/I, Java, C#, Rexx, Visual Basic and the databases DB2, Oracle, MS SQL Server, and Informix all use this form of encoding, as do decimal arithmetic libraries, including decNumber for C, bignum for Perl 6, Decimal in Python 2.4, EDA for Eiffel, ArciMath and IBM's BigDecimal classes for Java, ADAR for Ada, and the X/Open ISAM decimal type. In engineering, the same representation is often used – for example, in resistor color codes.
A normalized encoding would mean that the specification could not support these uses, and these existing software decimal calculations cannot be replaced by hardware which used a normalized encoding, because in up to 27% of cases the resulting coefficient and exponent will be different. This would require that all applications and their testcases be rewritten; an effort comparable to but significantly larger than the ‘Year 2000’ problem.
A normalized format could therefore only be used to store the integer coefficient of the numbers, with the decimal exponent being calculated and held separately (as in software today). Although this would give some performance improvement over a purely software implementation, all the calculation of exponents, calculating the length of the result, testing for rounding and overflow, etc., will still have to be done serially and in software, hence largely obliterating the potential performance advantages – while requiring the programmer to provide the rules of arithmetic instead of building them into the hardware.
All decimal forms can be derived by constraining the unnormalized encoding.
Three classes of decimal data are in common use:
- Integers (for example, numbers of items, call times in seconds): these are the numbers whose exponents are constrained to be 0. ‘Large integers’, such as 14 billion, are those whose exponents are greater than 0.
- Fixed-point (for example, $1.50, 1.200m): these are the numbers whose exponents are constrained to be a fixed value which is less than zero (-2 and -3 for the two examples).
- Floating-point (for example, exchange rates, tax rates, intermediate values in calculations): these are the numbers whose exponent is unconstrained (except for some maximum and minimum value). The coefficient may be constrained to a given precision (as for exchange rates or COBOL intermediate values), and it may be normalized or unnormalized, depending on the application.
These types are all a subset of the standard type.
An unnormalized encoding allows more efficient handling of integer quantities.
Decimal arithmetic often involves mixed integer and fractional arithmetic (for example: total 144 items at $17.99). Conversions to and from integers, whether decimal or binary, are simpler and faster with a unnormalized representation.
When the coefficient is itself an integer, as in the specification, this becomes effectively a copy, with the exponent being set to 0. Integer arithmetic on these values is then a trivial subset of the floating-point arithmetic.
In contrast, with a normalized representation the integers 10 and 11 must be stored with different exponents, and the coefficients will be shifted differently.
Zero is not a special case.
With an unnormalized encoding, the value zero is simply a number with a coefficient of zero (for details, see “How are zeros encoded?”). Zeros are treated in just the same way as any other number and need no special processing.
For example, the sums 1 + 1 and 1 + 0 can be handled identically, whether in software or in hardware.
In contrast, the coefficient in a normalized representation must always be non-zero, and so zero must have a special coding. This, in turn, means that there has to be a separate pathway for zero operands of every instruction, and a separate pathway to handle zero results. Further, every conversion has to test for zero (in both directions) instead of treating all numbers in the same manner.
Conversions from existing types require only integer conversions.
All existing decimal datatypes are encoded as a scaled integer (see “How are the parts of decimal numbers encoded?”), where the scale is a binary number and the coefficient is an integer (which might be binary, BCD, or some other format).
All of these types need only the appropriate integer conversion to be converted to or from the standard format.
Numbers can be recorded with their precision as written.
An unnormalized encoding allows store-and-forward handling of decimal numbers while preserving their implied exponent, as is required when the model for their use is an integral number of quanta (for example, an integer multiple of millimeters). This is independent of any arithmetic on the numbers.
For example, a number retrieved from a database will have a coefficient (perhaps 250) and an exponent (perhaps -2), having the value 2.50. With an unnormalized layout, this number can be stored and then later retrieved without loss of information. With a normalized layout, it would have to be stored as 25 with an exponent of -1 (or a fraction-coefficient equivalent) and the original values of coefficient and exponent cannot be reconstructed.
Similarly, when the characteristic exponent of a number is preserved by using an unnormalized layout, it is possible to separate the display of a number from its calculation. The display component of an application can safely display a number knowing that the intent of the logic that produced it is preserved; this means that the process of displaying can be guaranteed to not alter the value of a number and instead be only concerned with locale-dependent aspects of display.
For example, if numbers are normalized then the display component may be forced to choose a display exponent for the number (perhaps rounding to two digits after the decimal point). This will hide information present in the number if its exponent were less than -2, and could well introduce rounding errors – or obscure serious errors of calculation.
An unnormalized encoding preserves conventional precision indication in engineering and human-centric applications.
If trailing fractional zeros are removed, measurements and contracts may appear to be more vague (less precise) than intended, and information is lost. For example:
- The length of a steel beam might be specified as 1.200 meters; if this value is altered to 1.2 meters then a contractor might be entitled to provide a beam that is within 5 centimeters of that length, rather than measured to the nearest millimeter.
- Geographical survey and mapping records indicate the precision of measurements using fractional trailing zeros as necessary. Loop closure software makes use of this information for distributing errors. If fractional zeros are lost then precisely measured segments will appear imprecise; they will be over-adjusted and the final result will be corrupt.
- The driving directions “Turn left after 14 miles” and “Turn left after 14.0 miles” lead to different driver behavior.
Human-oriented applications require unnormalized encodings.
The results of human calculations and measurements are written in an unnormalized form (1.23 + 1.27 gives 2.50). To preserve these in their familiar form, they must be recorded as an unnormalized number.
If hardware forced the software to record data in a normalized form, the end user has to adapt to the unusual and unexpected format of the computer. Applications which intrude in this way are unacceptable to many people.
An unnormalized encoding allows either normalized or unnormalized arithmetic.
Unnormalized arithmetic is required or expected in many applications (see “Why is the arithmetic unnormalized?”).
A normalized layout would not allow unnormalized arithmetic, whereas an unnormalized layout can support both normalized and unnormalized arithmetic.

Please send any comments or corrections to Mike Cowlishaw, mfc@speleotrove.com