This document summarizes the decimal data encoding presented in the
paper:
Densely Packed Decimal Encoding, by Mike Cowlishaw, in
IEE Proceedings – Computers and Digital Techniques,
ISSN 1350-2387, Vol. 149, No. 3, pp102–104, IEE, May 2002
|
Abstract:
Chen-Ho encoding is a lossless compression of three Binary Coded
Decimal digits into 10 bits using an algorithm which can be applied or
reversed using only simple Boolean operations. An improvement to the
encoding which has the same advantages but is not limited to multiples
of three digits is described. The new encoding allows
arbitrary-length decimal numbers to be coded efficiently while keeping
decimal digit boundaries accessible. This in turn permits efficient
decimal arithmetic and makes the best use of available resources such
as storage or hardware registers.
|
|
Chen-Ho encoding is described in the paper Storage-Efficient
Representation of Decimal Data, by Tien Chi Chen & Irving T. Ho,
in CACM (18)1, pp.49–52, January 1975.
Their encoding scheme is summarized in
http://speleotrove.com/decimal/chen-ho.html.
Briefly, Chen-Ho encoding compresses three decimal digits into 10 bits
with a 0.34% wastage, giving a 20% more efficient encoding than simple
BCD (one digit in 4 bits). The primary advantage of the encoding
over a pure binary representation in ten bits is that no arithmetic
is needed for conversion to or from BCD. Only a very few boolean
operations are needed for conversions – in hardware, encoding
or decoding can be achieved with only 2–3 gate delays; in
software, a simple table look-up suffices. In addition, the encoding
has other advantages, for example, the least-significant bit of each
digit remains unencoded, which allows bit-per-digit operations to be
effected directly.
The new encoding described here uses an equivalent encoding to the
Chen-Ho scheme, but instead of a Huffman code it uses a novel
arrangement of bits which give further advantages:
- Compression of one or two decimal digits (into the optimal four or
seven bits respectively) is achieved as a subset of the 3-digit
encoding. This means that arbitrary numbers of decimal digits can be
encoded efficiently. For example, 38 decimal digits can be encoded in
127 bits, or 71 digits can be encoded in 237.
In contrast, Chen-Ho encoding requires that the compressed
representation be a multiple of 10 bits and hence that the number of
decimal digits always be a multiple of three.
-
The encodings for one or two decimal digits are right-aligned in the ten
bits (the remaining bits being 0). This means that encoded decimal
numbers can be expanded into a longer field simply by padding with zero
bits; no re-encoding is necessary.
In contrast, Chen-Ho encoding requires that expanding an encoded two
digits into a three-digit field (for example) would need a re-encoding
instead of simple padding.
-
The positioning and choice of the indicator bits allow all single-digit
numbers (indeed, all numbers in the range 0 through 79) to have the
same right-aligned encoding as in BCD. This makes conversions of common
small numbers trivial.
In contrast, Chen-Ho encoding only maps the numbers 0 through 7
unchanged.
These advantages make the new encoding a better choice than Chen-Ho
encoding for both hardware and software representations of decimal
numbers.
Here are some examples:
BCD |
Chen-Ho |
Densely Packed |
005 |
000 000 0101 |
000 000 0101 |
009 |
110 000 0001 |
000 000 1001 |
055 |
000 010 1101 |
000 101 0101 |
099 |
111 000 1001 |
000 101 1111 |
555 |
010 110 1101 |
101 101 0101 |
999 |
111 111 1001 |
001 111 1111 |
|
Note that, necessarily, neither encoding is directly sortable. For
example, if the encoded values are sorted in ascending order as
though they were binary strings and then decoded to BCD, the BCD
results will not be in ascending order.
Details
The new encoding, like Chen-Ho encoding, considers each of the three
digits as either being small (0–7, requiring 3 bits to
distinguish) or large (8 or 9, requiring one bit). The top bit of
each BCD digit is 0 for small values, and 1 for the two large values.
The possible combinations of these ranges are then:
a) |
All digits are small (51.2% of the possibilities).
This requires 3+3+3 bits for the digits, leaving 1 bit to indicate
this combination.
|
b) |
Two digits are small (38.4%).
This requires 3+3+1 bits for the digits, leaving 3 to indicate
this combination.
|
c) |
One digit is small (9.6%).
This requires 3+1+1 bits for the digits, leaving 5 to indicate
this combination.
|
d) |
No digits are small (0.8%).
This requires 1+1+1 bits for the digits, leaving 7 (only 5 are
needed) for the indication.
|
The following tables fully describe the encoding (compression from BCD)
and decoding (expansion to BCD); the letters a–k and m
represent the 12 bits of three BCD digits, and p–y represent
the 10 bits of the encoded digits.
Compression: (abcd)(efgh)(ijkm) becomes (pqr)(stu)(v)(wxy)
aei |
pqr stu v wxy |
Comments |
000
001
010
100
110
101
011
111 |
bcd fgh 0 jkm
bcd fgh 1 00m
bcd jkh 1 01m
jkd fgh 1 10m
jkd 00h 1 11m
fgd 01h 1 11m
bcd 10h 1 11m
00d 11h 1 11m |
All digits are small
Right digit is large [this keeps 0-9 unchanged]
Middle digit is large
Left digit is large
Right digit is small [L & M are large]
Middle digit is small [L & R are large]
Left digit is small [M & R are large]
All digits are large; two bits are unused
|
(24 of the 1024 values are redundant and could be used for future
extensions; however they are generally mapped to specific
combinations of 8s and 9s as defined by the equations in the
DPD paper.)
|
|
|
Expansion: (pqr)(stu)(v)(wxy) becomes (abcd)(efgh)(ijkm)
vwxst | abcd efgh ijkm |
0....
100..
101..
110..
11100
11101
11110
11111
|
0pqr 0stu 0wxy
0pqr 0stu 100y
0pqr 100u 0sty
100r 0stu 0pqy
100r 100u 0pqy
100r 0pqu 100y
0pqr 100u 100y
100r 100u 100y
|
('.' means don't-care.)
|
|
|
In hardware, simple boolean operations on input bits define each output
bit (for example, during compression, r=d and
v=a+e+i). The operations can be derived from the tables,
and also appear as boolean expressions in the sample code below.
For software, table lookup is a simple and efficient solution. Below is
a Rexx program which is easy to
modify for generating table constants (or it can be ported to a
different language).
/* dpdGenerate.rex -- display Densely Packed Decimal table */
/* mfc 2000.10.03; Rexx version with new equations 2007.02.01 */
do i=0 to 999
bcd=right(i, 3, 0) -- make three-digit hexadecimal string
bit10=bcd2dpd(x2b(bcd)) -- compress
bit12=dpd2bcd(bit10) -- expand
say bcd bit10 bit12 -- display
end i
exit
/* bcd2dpd -- Compress BCD to Densely Packed Decimal
argument is a string of 12 characters, each 0 or 1, being 3 digits
of 4 bits, each being a valid BCD digit (0000-1001)
(for example, 923 is 100100100011)
result is a string of 10 characters, each 0 or 1
(for the example, this would be 0110101101)
*/
bcd2dpd: procedure
-- assign each bit to a variable, named as in the description
parse arg a +1 b +1 c +1 d +1 e +1 f +1 g +1 h +1 i +1 j +1 k +1 m +1
-- derive the result bits, using boolean expressions only
-- [the operators are: '&'=AND, '|'=OR, '\'=NOT.]
p=b | (a & j) | (a & f & i)
q=c | (a & k) | (a & g & i)
r=d
s=(f & (\a | \i)) | (\a & e & j) | (e & i)
t=g | (\a & e &k) | (a & i)
u=h
v=a | e | i
w=a | (e & i) | (\e & j)
x=e | (a & i) | (\a & k)
y=m
-- concatenate the bits and return
return p||q||r||s||t||u||v||w||x||y
/* dpd2bcd -- Expand Densely Packed Decimal to BCD
argument is a string of 10 characters, each 0 or 1; all 1024
possibilities are accepted (non-canonicals -> various)
(for example, 0110101101)
result is a string of 12 characters, each 0 or 1
(for the example, 100100100011 -- 923)
*/
dpd2bcd: procedure
-- assign each bit to a variable, named as in the description
parse arg p +1 q +1 r +1 s +1 t +1 u +1 v +1 w +1 x +1 y +1
-- derive the result bits, using boolean expressions only
a= (v & w) & (\s | t | \x)
b=p & (\v | \w | (s & \t & x))
c=q & (\v | \w | (s & \t & x))
d=r
e=v & ((\w & x) | (\t & x) | (s & x))
f=(s & (\v | \x)) | (p & \s & t & v & w & x)
g=(t & (\v | \x)) | (q & \s & t & w)
h=u
i=v & ((\w & \x) | (w & x & (s | t)))
j=(\v & w) | (s & v & \w & x) | (p & w & (\x | (\s & \t)))
k=(\v & x) | (t & \w & x) | (q & v & w & (\x | (\s & \t)))
m=y
-- concatenate the bits and return
return a||b||c||d||e||f||g||h||i||j||k||m
|
Note: versions of this web page prior to 2007.02.01 (with the NetRexx
code example) used equations for dpd2bcd which assumed the
output was “don’t care” for the 24 redundant
(non-canonical) codes. The equations here result in specific
sequences of 8s and 9s for each redundant code, as specified in the
DPD paper.
|
|
Acknowledgement
Many thanks to Mark Erle and Brian Hickmann for suggestions for
improvements to this page and for their careful checking.
Summary by
Mike Cowlishaw
(mfc@speleotrove.com),
3 October 2000, updated 20 May 2001, 26 April 2002, 6 July 2002, 8
January 2003, 16 July 2005, 13 February 2007.
Copyright © IBM Corporation and Mike Cowlishaw, 2000, 2007. All rights
reserved.