Decimal Arithmetic FAQ
Part 5 – Encoding Questions |
There are many ways of encoding numbers, but here we'll only discuss decimal numbers which are encoded in a series of contiguous bytes (like binary floats and doubles) and which are described by a pair of parameters: a coefficient which is multiplied by ten raised to the power of an exponent. For information on other forms of decimal encodings, see “How are the parts of decimal numbers encoded?”.
The value of a number encoded with these two parameters is coefficient × 10exponent. For example, if the coefficient is 9 and the exponent is 3 then the value of the number is 9000 (9×103), and if the exponent were -2 then the value would be 0.09 (9×10-2).
(For simplicity, only positive numbers and zero will be described in this answer. Assume that for any of these numbers there is a corresponding negative number, indicated by a separate sign bit.)
In a given encoding of decimal numbers, each of these parameters will have a ‘hard limit’:
These limits are usually determined by some external factor (often the size of a hardware register). For this discussion, suppose Plimit=7 and Elimit=191 (we'll use these limits for all the examples below – they are conveniently small and correspond to the actual limits in the IEEE 754-2008 decimal 32-bit format).
Within these limits, there is some flexibility in the way numbers can be encoded. We can choose to treat the digits of the coefficient as an integer (in the range 0 through 9999999) or we can apply a scale, which is a constant power of ten by which such a coefficient is divided. (For example, if the scale were 6, the value of the coefficient would be in the range 0 through 9.999999.) Similarly, the bias can be varied to change the range of the exponent (for example, if the bias were 90 then the exponent could take the values -90 through +101).
These two parameters (scale and bias) are related. A given encoding (for example a coefficient encoded as 9999999 and an exponent encoded as 90) will have the same value when the scale is 6 and the bias is 90 as when the scale is 0 and the bias is 84. In the first case, the value of the coefficient is 9.999999 and the exponent is 0, and in the second case, the value of the coefficient is 9999999 and the exponent is -6.
In fact, the choice of scale is arbitrary: for a given scale, we can adjust the bias by the scale so the value of any particular bit pattern (encoding) is unchanged. We can therefore simplify this discussion by choosing a particular scale and then just consider the bias.
For many reasons, decimal numbers are usually described by a pair of integers, and therefore it proves convenient to consider the coefficient of decimal numbers as having a scale of 0 (so the coefficient is an integer). We'll use this value for the other questions in this FAQ. (Bear in mind that for describing the encoding we could equally well have chosen a scale of 6, similar to the traditional way of describing binary floating-point numbers, without affecting the remainder of this discussion other than the need to subtract 6 from the bias.)
(This answer assumes you have already read the answer to “How are decimal numbers encoded?” where the terminology and examples are explained.)
The choice of bias for a given encoding is largely constrained by the rules of IEEE 854 and the revised IEEE 754-2008. These rules place two requirements on the set of values which must be representable:
An encoding must be able to represent all possible values with precision up to Plimit whose value is lower than the overflow threshold and greater than or equal to the underflow threshold. These values are called the normal numbers. In our example format, the range of normal numbers is 1×10Emin through 9.999999×10Emax.
Given these requirements, it would seem that we can now determine Emax and Emin, and hence the bias, given Plimit and Elimit. However, IEEE 854 allows a choice to be made which affects this calculation: the encoding may be redundant if desired:
For decimal arithmetic, intended as a tool for human use, the choice here is dictated by the need to mirror manual calculations and other human conventions (see “Why are the encodings unnormalized?”). A non-redundant encoding is inadequate for many applications, so a redundant (dual-integer) encoding is the norm.
This makes a difference at the top of the range of numbers (at the bottom of the range, the subnormal numbers cover the values for which redundant encodings can occur). For example, in the sample format, the largest normal number is 9.999999×10Emax, which is represented by a coefficient of 9999999 and an exponent of Emax-6. However, the number 9.999990×10Emax is in the range of normal numbers, and this could be represented by either a coefficient of 9999990 and an exponent of Emax-6 or a coefficient of 999999 and an exponent of Emax-5.
When multiplying numbers using this form of representation, the result coefficient is simply the product of the operand coefficients and the result exponent is the sum of the operand exponents (2E+5 × 3E+7 gives 6E+12). Hence, either encoding for 9.999990×10Emax can arise by multiplying an appropriate pair of smaller normal numbers together.
Note that the second encoding shown for 9.999990×10Emax has a larger exponent than the exponent of the largest normal number (Emax-6), and in fact if all the redundant encodings which use up to Plimit digits are allowed, the largest exponent used in a representation will be Emax. (For example, the number 9.000000×10Emax encoded with a coefficient of 9 and an exponent of Emax.)
It might seem that we can avoid using these larger exponents by converting any such result to a value with a larger coefficient (for example, encoding the number 9.000000×10Emax with a coefficient of 9000000 and an exponent of Emax-6). This process is called clamping.
However, if we do this the result of a multiplication could differ depending on the encoding used, even though the value of the result is in the range of normal numbers and there is no overflow. If the same calculation were carried out in a format which had a greater exponent range, those results which had their exponent reduced (normalized) in the restricted format would not be normalized in the larger format: they would have a different coefficient and exponent.
Similarly, if the same calculation were carried out by hand, or by using existing computer decimal arithmetic (such as in Java, C#, or Rexx), we would not get the normalized result. This normalization would be an artifact which only appeared near a physical format (encoding) boundary.
This disadvantage, it was decided in committee, is outweighed by the the wider exponent range achieved (and the avoidance of invalid ‘supernormal’ numbers), and hence clamping is assumed.
Therefore, the largest exponent needed is Emax-6, and the smallest exponent needed is Emin-6. Providing a range of exponents bounded by these values allows us to meet all the requirements of IEEE 854 and decimal arithmetic.
From these two figures, we can easily calculate the Emax for a given Elimit. In our example format there must be 2 × Emax exponent values (the -Emin+6 negative values, the Emax-6 positive values, and 0). As Elimit is 191, there are 192 values available and so Emax must be +96. (In general, Emax = (Elimit+1) ÷ 2, and Emin = -(Emax - 1).)
The value of the bias follows directly from the value of Emin. The smallest exponent (Etiny) must be -101, and this will be encoded as 0. The bias is therefore 101. (In general: bias = -Emin + Plimit - 1.)
In the early days of electronic computers, many computers were decimal (some even used decimal numbers for addressing storage), and a great variety of both fixed-point and floating-point decimal encodings were used.
Over the years, most of these encodings were abandoned, but the form of decimal encoding that has endured (because of its practicality and usefulness) is the dual-integer encoding. Dual-integer encodings describe a decimal number using two integers: a coefficient and an exponent (often called a scale, which is a negative exponent). The value of such an number is
coefficient × 10exponent
(For example, if the coefficient were 123 and the exponent -2, the value of the number is 1.23.)
These two integers can be encoded in various ways. The exponent is almost always encoded as a small binary integer (up to 32 bits). The coefficient is generally one of three forms:
BCD is less efficient in space than a binary integer encoding, but it is much easier to convert a decimal number in this form to and from a character string representation. Rounding after any operation, and alignment before an addition or subtraction, are simplest in this form.
The IEEE 754 decimal encodings for decimal numbers are also dual-integer in form, but use a compressed form of BCD (Densely Packed Decimal) which allows a higher precision decimal number in a given size. For example, a 64-bit encoded number can hold a 16-digit coefficient with a maximum normal exponent (Emax) of +384. In contrast, if BCD were used for the exponent the coefficient would be 13 digits, with a reduced maximum exponent of +64.
The Densely Packed Decimal encoding can be expanded to (or compressed from) BCD very rapidly (using table lookup in software or very simple logic with 2–3 gate delays in hardware). This means that the advantages of having a BCD encoding are preserved while allowing more precision and range in calculations.
Any number whose coefficient is zero has the value zero. There are therefore many redundant encodings of zero, and the exponents of the operands of a calculation are preserved when the answer is zero in the same way as they are when the result is non-zero.
For details, see “How are zeros with exponents handled?”.
For decimal arithmetic, intended as a tool for human use, the choice of an unnormalized encoding is dictated by the need to mirror manual calculations and other human conventions. A normalized (non-redundant) encoding is ideal for purely mathematical calculations, but is inadequate for many other applications. Notably:
A normalized encoding would mean that the specification could not support these uses, and these existing software decimal calculations cannot be replaced by hardware which used a normalized encoding, because in up to 27% of cases the resulting coefficient and exponent will be different. This would require that all applications and their testcases be rewritten; an effort comparable to but significantly larger than the ‘Year 2000’ problem.
A normalized format could therefore only be used to store the integer coefficient of the numbers, with the decimal exponent being calculated and held separately (as in software today). Although this would give some performance improvement over a purely software implementation, all the calculation of exponents, calculating the length of the result, testing for rounding and overflow, etc., will still have to be done serially and in software, hence largely obliterating the potential performance advantages – while requiring the programmer to provide the rules of arithmetic instead of building them into the hardware.
These types are all a subset of the standard type.
When the coefficient is itself an integer, as in the specification, this becomes effectively a copy, with the exponent being set to 0. Integer arithmetic on these values is then a trivial subset of the floating-point arithmetic.
In contrast, with a normalized representation the integers 10 and 11 must be stored with different exponents, and the coefficients will be shifted differently.
For example, the sums 1 + 1 and 1 + 0 can be handled identically, whether in software or in hardware.
In contrast, the coefficient in a normalized representation must always be non-zero, and so zero must have a special coding. This, in turn, means that there has to be a separate pathway for zero operands of every instruction, and a separate pathway to handle zero results. Further, every conversion has to test for zero (in both directions) instead of treating all numbers in the same manner.
All of these types need only the appropriate integer conversion to be converted to or from the standard format.
For example, a number retrieved from a database will have a coefficient (perhaps 250) and an exponent (perhaps -2), having the value 2.50. With an unnormalized layout, this number can be stored and then later retrieved without loss of information. With a normalized layout, it would have to be stored as 25 with an exponent of -1 (or a fraction-coefficient equivalent) and the original values of coefficient and exponent cannot be reconstructed.
Similarly, when the characteristic exponent of a number is preserved by using an unnormalized layout, it is possible to separate the display of a number from its calculation. The display component of an application can safely display a number knowing that the intent of the logic that produced it is preserved; this means that the process of displaying can be guaranteed to not alter the value of a number and instead be only concerned with locale-dependent aspects of display.
For example, if numbers are normalized then the display component may be forced to choose a display exponent for the number (perhaps rounding to two digits after the decimal point). This will hide information present in the number if its exponent were less than -2, and could well introduce rounding errors – or obscure serious errors of calculation.
If hardware forced the software to record data in a normalized form, the end user has to adapt to the unusual and unexpected format of the computer. Applications which intrude in this way are unacceptable to many people.
A normalized layout would not allow unnormalized arithmetic, whereas an unnormalized layout can support both normalized and unnormalized arithmetic.
Please send any comments or corrections to Mike Cowlishaw, mfc@speleotrove.com |
Copyright © IBM Corporation 2000, 2007. All rights reserved.
|