This document summarizes the decimal data encoding presented in the
paper:
Densely Packed Decimal Encoding, by Mike Cowlishaw, in
IEE Proceedings – Computers and Digital Techniques,
ISSN 13502387, Vol. 149, No. 3, pp102–104, IEE, May 2002

Abstract:
ChenHo encoding is a lossless compression of three Binary Coded
Decimal digits into 10 bits using an algorithm which can be applied or
reversed using only simple Boolean operations. An improvement to the
encoding which has the same advantages but is not limited to multiples
of three digits is described. The new encoding allows
arbitrarylength decimal numbers to be coded efficiently while keeping
decimal digit boundaries accessible. This in turn permits efficient
decimal arithmetic and makes the best use of available resources such
as storage or hardware registers.


ChenHo encoding is described in the paper StorageEfficient
Representation of Decimal Data, by Tien Chi Chen & Irving T. Ho,
in CACM (18)1, pp.49–52, January 1975.
Their encoding scheme is summarized in
http://speleotrove.com/decimal/chenho.html.
Briefly, ChenHo encoding compresses three decimal digits into 10 bits
with a 0.34% wastage, giving a 20% more efficient encoding than simple
BCD (one digit in 4 bits). The primary advantage of the encoding
over a pure binary representation in ten bits is that no arithmetic
is needed for conversion to or from BCD. Only a very few boolean
operations are needed for conversions – in hardware, encoding
or decoding can be achieved with only 2–3 gate delays; in
software, a simple table lookup suffices. In addition, the encoding
has other advantages, for example, the leastsignificant bit of each
digit remains unencoded, which allows bitperdigit operations to be
effected directly.
The new encoding described here uses an equivalent encoding to the
ChenHo scheme, but instead of a Huffman code it uses a novel
arrangement of bits which give further advantages:
 Compression of one or two decimal digits (into the optimal four or
seven bits respectively) is achieved as a subset of the 3digit
encoding. This means that arbitrary numbers of decimal digits can be
encoded efficiently. For example, 38 decimal digits can be encoded in
127 bits, or 71 digits can be encoded in 237.
In contrast, ChenHo encoding requires that the compressed
representation be a multiple of 10 bits and hence that the number of
decimal digits always be a multiple of three.

The encodings for one or two decimal digits are rightaligned in the ten
bits (the remaining bits being 0). This means that encoded decimal
numbers can be expanded into a longer field simply by padding with zero
bits; no reencoding is necessary.
In contrast, ChenHo encoding requires that expanding an encoded two
digits into a threedigit field (for example) would need a reencoding
instead of simple padding.

The positioning and choice of the indicator bits allow all singledigit
numbers (indeed, all numbers in the range 0 through 79) to have the
same rightaligned encoding as in BCD. This makes conversions of common
small numbers trivial.
In contrast, ChenHo encoding only maps the numbers 0 through 7
unchanged.
These advantages make the new encoding a better choice than ChenHo
encoding for both hardware and software representations of decimal
numbers.
Here are some examples:
BCD 
ChenHo 
Densely Packed 
005 
000 000 0101 
000 000 0101 
009 
110 000 0001 
000 000 1001 
055 
000 010 1101 
000 101 0101 
099 
111 000 1001 
000 101 1111 
555 
010 110 1101 
101 101 0101 
999 
111 111 1001 
001 111 1111 

Note that, necessarily, neither encoding is directly sortable. For
example, if the encoded values are sorted in ascending order as
though they were binary strings and then decoded to BCD, the BCD
results will not be in ascending order.
Details
The new encoding, like ChenHo encoding, considers each of the three
digits as either being small (0–7, requiring 3 bits to
distinguish) or large (8 or 9, requiring one bit). The top bit of
each BCD digit is 0 for small values, and 1 for the two large values.
The possible combinations of these ranges are then:
a) 
All digits are small (51.2% of the possibilities).
This requires 3+3+3 bits for the digits, leaving 1 bit to indicate
this combination.

b) 
Two digits are small (38.4%).
This requires 3+3+1 bits for the digits, leaving 3 to indicate
this combination.

c) 
One digit is small (9.6%).
This requires 3+1+1 bits for the digits, leaving 5 to indicate
this combination.

d) 
No digits are small (0.8%).
This requires 1+1+1 bits for the digits, leaving 7 (only 5 are
needed) for the indication.

The following tables fully describe the encoding (compression from BCD)
and decoding (expansion to BCD); the letters a–k and m
represent the 12 bits of three BCD digits, and p–y represent
the 10 bits of the encoded digits.
Compression: (abcd)(efgh)(ijkm) becomes (pqr)(stu)(v)(wxy)
aei 
pqr stu v wxy 
Comments 
000
001
010
100
110
101
011
111 
bcd fgh 0 jkm
bcd fgh 1 00m
bcd jkh 1 01m
jkd fgh 1 10m
jkd 00h 1 11m
fgd 01h 1 11m
bcd 10h 1 11m
00d 11h 1 11m 
All digits are small
Right digit is large [this keeps 09 unchanged]
Middle digit is large
Left digit is large
Right digit is small [L & M are large]
Middle digit is small [L & R are large]
Left digit is small [M & R are large]
All digits are large; two bits are unused

(24 of the 1024 values are redundant and could be used for future
extensions; however they are generally mapped to specific
combinations of 8s and 9s as defined by the equations in the
DPD paper.)



Expansion: (pqr)(stu)(v)(wxy) becomes (abcd)(efgh)(ijkm)
vwxst  abcd efgh ijkm 
0....
100..
101..
110..
11100
11101
11110
11111

0pqr 0stu 0wxy
0pqr 0stu 100y
0pqr 100u 0sty
100r 0stu 0pqy
100r 100u 0pqy
100r 0pqu 100y
0pqr 100u 100y
100r 100u 100y

('.' means don'tcare.)



In hardware, simple boolean operations on input bits define each output
bit (for example, during compression, r=d and
v=a+e+i). The operations can be derived from the tables,
and also appear as boolean expressions in the sample code below.
For software, table lookup is a simple and efficient solution. Below is
a Rexx program which is easy to
modify for generating table constants (or it can be ported to a
different language).
/* dpdGenerate.rex  display Densely Packed Decimal table */
/* mfc 2000.10.03; Rexx version with new equations 2007.02.01 */
do i=0 to 999
bcd=right(i, 3, 0)  make threedigit hexadecimal string
bit10=bcd2dpd(x2b(bcd))  compress
bit12=dpd2bcd(bit10)  expand
say bcd bit10 bit12  display
end i
exit
/* bcd2dpd  Compress BCD to Densely Packed Decimal
argument is a string of 12 characters, each 0 or 1, being 3 digits
of 4 bits, each being a valid BCD digit (00001001)
(for example, 923 is 100100100011)
result is a string of 10 characters, each 0 or 1
(for the example, this would be 0110101101)
*/
bcd2dpd: procedure
 assign each bit to a variable, named as in the description
parse arg a +1 b +1 c +1 d +1 e +1 f +1 g +1 h +1 i +1 j +1 k +1 m +1
 derive the result bits, using boolean expressions only
 [the operators are: '&'=AND, ''=OR, '\'=NOT.]
p=b  (a & j)  (a & f & i)
q=c  (a & k)  (a & g & i)
r=d
s=(f & (\a  \i))  (\a & e & j)  (e & i)
t=g  (\a & e &k)  (a & i)
u=h
v=a  e  i
w=a  (e & i)  (\e & j)
x=e  (a & i)  (\a & k)
y=m
 concatenate the bits and return
return pqrstuvwxy
/* dpd2bcd  Expand Densely Packed Decimal to BCD
argument is a string of 10 characters, each 0 or 1; all 1024
possibilities are accepted (noncanonicals > various)
(for example, 0110101101)
result is a string of 12 characters, each 0 or 1
(for the example, 100100100011  923)
*/
dpd2bcd: procedure
 assign each bit to a variable, named as in the description
parse arg p +1 q +1 r +1 s +1 t +1 u +1 v +1 w +1 x +1 y +1
 derive the result bits, using boolean expressions only
a= (v & w) & (\s  t  \x)
b=p & (\v  \w  (s & \t & x))
c=q & (\v  \w  (s & \t & x))
d=r
e=v & ((\w & x)  (\t & x)  (s & x))
f=(s & (\v  \x))  (p & \s & t & v & w & x)
g=(t & (\v  \x))  (q & \s & t & w)
h=u
i=v & ((\w & \x)  (w & x & (s  t)))
j=(\v & w)  (s & v & \w & x)  (p & w & (\x  (\s & \t)))
k=(\v & x)  (t & \w & x)  (q & v & w & (\x  (\s & \t)))
m=y
 concatenate the bits and return
return abcdefghijkm

Note: versions of this web page prior to 2007.02.01 (with the NetRexx
code example) used equations for dpd2bcd which assumed the
output was “don’t care” for the 24 redundant
(noncanonical) codes. The equations here result in specific
sequences of 8s and 9s for each redundant code, as specified in the
DPD paper.


Acknowledgement
Many thanks to Mark Erle and Brian Hickmann for suggestions for
improvements to this page and for their careful checking.
Summary by
Mike Cowlishaw
(mfc@speleotrove.com),
3 October 2000, updated 20 May 2001, 26 April 2002, 6 July 2002, 8
January 2003, 16 July 2005, 13 February 2007.
Copyright © IBM Corporation and Mike Cowlishaw, 2000, 2007. All rights
reserved.