A Summary of Densely Packed Decimal encoding


This document summarizes the decimal data encoding presented in the paper:
Densely Packed Decimal Encoding, by Mike Cowlishaw, in IEE Proceedings – Computers and Digital Techniques, ISSN 1350-2387, Vol. 149, No. 3, pp102–104, IEE, May 2002
Abstract: Chen-Ho encoding is a lossless compression of three Binary Coded Decimal digits into 10 bits using an algorithm which can be applied or reversed using only simple Boolean operations. An improvement to the encoding which has the same advantages but is not limited to multiples of three digits is described. The new encoding allows arbitrary-length decimal numbers to be coded efficiently while keeping decimal digit boundaries accessible. This in turn permits efficient decimal arithmetic and makes the best use of available resources such as storage or hardware registers.

Chen-Ho encoding is described in the paper Storage-Efficient Representation of Decimal Data, by Tien Chi Chen & Irving T. Ho, in CACM (18)1, pp.49–52, January 1975. Their encoding scheme is summarized in http://speleotrove.com/decimal/chen-ho.html.

Briefly, Chen-Ho encoding compresses three decimal digits into 10 bits with a 0.34% wastage, giving a 20% more efficient encoding than simple BCD (one digit in 4 bits). The primary advantage of the encoding over a pure binary representation in ten bits is that no arithmetic is needed for conversion to or from BCD. Only a very few boolean operations are needed for conversions – in hardware, encoding or decoding can be achieved with only 2–3 gate delays; in software, a simple table look-up suffices. In addition, the encoding has other advantages, for example, the least-significant bit of each digit remains unencoded, which allows bit-per-digit operations to be effected directly.

The new encoding described here uses an equivalent encoding to the Chen-Ho scheme, but instead of a Huffman code it uses a novel arrangement of bits which give further advantages:

  1. Compression of one or two decimal digits (into the optimal four or seven bits respectively) is achieved as a subset of the 3-digit encoding. This means that arbitrary numbers of decimal digits can be encoded efficiently. For example, 38 decimal digits can be encoded in 127 bits, or 71 digits can be encoded in 237.

    In contrast, Chen-Ho encoding requires that the compressed representation be a multiple of 10 bits and hence that the number of decimal digits always be a multiple of three.

  2. The encodings for one or two decimal digits are right-aligned in the ten bits (the remaining bits being 0). This means that encoded decimal numbers can be expanded into a longer field simply by padding with zero bits; no re-encoding is necessary.

    In contrast, Chen-Ho encoding requires that expanding an encoded two digits into a three-digit field (for example) would need a re-encoding instead of simple padding.

  3. The positioning and choice of the indicator bits allow all single-digit numbers (indeed, all numbers in the range 0 through 79) to have the same right-aligned encoding as in BCD. This makes conversions of common small numbers trivial.

    In contrast, Chen-Ho encoding only maps the numbers 0 through 7 unchanged.

These advantages make the new encoding a better choice than Chen-Ho encoding for both hardware and software representations of decimal numbers.

Here are some examples:
BCD Chen-Ho Densely Packed
005 000 000 0101 000 000 0101
009 110 000 0001 000 000 1001
055 000 010 1101 000 101 0101
099 111 000 1001 000 101 1111
555 010 110 1101 101 101 0101
999 111 111 1001 001 111 1111
Note that, necessarily, neither encoding is directly sortable. For example, if the encoded values are sorted in ascending order as though they were binary strings and then decoded to BCD, the BCD results will not be in ascending order.


Details

The new encoding, like Chen-Ho encoding, considers each of the three digits as either being small (0–7, requiring 3 bits to distinguish) or large (8 or 9, requiring one bit). The top bit of each BCD digit is 0 for small values, and 1 for the two large values.

The possible combinations of these ranges are then:

a) All digits are small (51.2% of the possibilities).
This requires 3+3+3 bits for the digits, leaving 1 bit to indicate this combination.
b) Two digits are small (38.4%).
This requires 3+3+1 bits for the digits, leaving 3 to indicate this combination.
c) One digit is small (9.6%).
This requires 3+1+1 bits for the digits, leaving 5 to indicate this combination.
d) No digits are small (0.8%).
This requires 1+1+1 bits for the digits, leaving 7 (only 5 are needed) for the indication.

The following tables fully describe the encoding (compression from BCD) and decoding (expansion to BCD); the letters a–k and m represent the 12 bits of three BCD digits, and p–y represent the 10 bits of the encoded digits.

Compression: (abcd)(efgh)(ijkm) becomes (pqr)(stu)(v)(wxy)
aei pqr stu v wxy Comments
000
001
010
100
110
101
011
111
bcd fgh 0 jkm
bcd fgh 1 00m
bcd jkh 1 01m
jkd fgh 1 10m
jkd 00h 1 11m
fgd 01h 1 11m
bcd 10h 1 11m
00d 11h 1 11m
All digits are small
Right digit is large [this keeps 0-9 unchanged]
Middle digit is large
Left digit is large
Right digit is small [L & M are large]
Middle digit is small [L & R are large]
Left digit is small [M & R are large]
All digits are large; two bits are unused
(24 of the 1024 values are redundant and could be used for future extensions; however they are generally mapped to specific combinations of 8s and 9s as defined by the equations in the DPD paper.)

Expansion: (pqr)(stu)(v)(wxy) becomes (abcd)(efgh)(ijkm)
vwxst abcd efgh ijkm
0....
100..
101..
110..
11100
11101
11110
11111
0pqr 0stu 0wxy
0pqr 0stu 100y
0pqr 100u 0sty
100r 0stu 0pqy
100r 100u 0pqy
100r 0pqu 100y
0pqr 100u 100y
100r 100u 100y
('.' means don't-care.)
In hardware, simple boolean operations on input bits define each output bit (for example, during compression, r=d and v=a+e+i). The operations can be derived from the tables, and also appear as boolean expressions in the sample code below.

For software, table lookup is a simple and efficient solution. Below is a Rexx program which is easy to modify for generating table constants (or it can be ported to a different language).
/* dpdGenerate.rex -- display Densely Packed Decimal table    */
/* mfc 2000.10.03; Rexx version with new equations 2007.02.01 */

do i=0 to 999
  bcd=right(i, 3, 0)      -- make three-digit hexadecimal string
  bit10=bcd2dpd(x2b(bcd)) -- compress
  bit12=dpd2bcd(bit10)    -- expand
  say bcd bit10 bit12     -- display
  end i
exit

/* bcd2dpd -- Compress BCD to Densely Packed Decimal
   argument is a string of 12 characters, each 0 or 1, being 3 digits
            of 4 bits, each being a valid BCD digit (0000-1001)
            (for example, 923 is 100100100011)
   result   is a string of 10 characters, each 0 or 1
            (for the example, this would be 0110101101)
   */
bcd2dpd: procedure
  -- assign each bit to a variable, named as in the description
  parse arg a +1 b +1 c +1 d +1 e +1 f +1 g +1 h +1 i +1 j +1 k +1 m +1

  -- derive the result bits, using boolean expressions only
  -- [the operators are: '&'=AND, '|'=OR, '\'=NOT.]
  p=b | (a & j) | (a & f & i)
  q=c | (a & k) | (a & g & i)
  r=d
  s=(f & (\a | \i)) | (\a & e & j) | (e & i)
  t=g  | (\a & e &k) | (a & i)
  u=h
  v=a | e | i
  w=a | (e & i) | (\e & j)
  x=e | (a & i) | (\a & k)
  y=m
  -- concatenate the bits and return
  return p||q||r||s||t||u||v||w||x||y

/* dpd2bcd -- Expand Densely Packed Decimal to BCD
   argument is a string of 10 characters, each 0 or 1; all 1024
            possibilities are accepted (non-canonicals -> various)
            (for example, 0110101101)
   result   is a string of 12 characters, each 0 or 1
            (for the example, 100100100011 -- 923)
   */
dpd2bcd: procedure
  -- assign each bit to a variable, named as in the description
  parse arg p +1 q +1 r +1 s +1 t +1 u +1 v +1 w +1 x +1 y +1

  -- derive the result bits, using boolean expressions only
  a= (v & w) & (\s | t | \x)
  b=p & (\v | \w | (s & \t & x))
  c=q & (\v | \w | (s & \t & x))
  d=r
  e=v & ((\w & x) | (\t & x) | (s & x))
  f=(s & (\v | \x)) | (p & \s & t & v & w & x)
  g=(t & (\v | \x)) | (q & \s & t & w)
  h=u
  i=v & ((\w & \x) | (w & x & (s | t)))
  j=(\v & w) | (s & v & \w & x) | (p & w & (\x | (\s & \t)))
  k=(\v & x) | (t & \w & x) | (q & v & w & (\x | (\s & \t)))
  m=y
  -- concatenate the bits and return
  return a||b||c||d||e||f||g||h||i||j||k||m
Note: versions of this web page prior to 2007.02.01 (with the NetRexx code example) used equations for dpd2bcd which assumed the output was “don’t care” for the 24 redundant (non-canonical) codes. The equations here result in specific sequences of 8s and 9s for each redundant code, as specified in the DPD paper.


Acknowledgement
Many thanks to Mark Erle and Brian Hickmann for suggestions for improvements to this page and for their careful checking.
Summary by Mike Cowlishaw (mfc@speleotrove.com), 3 October 2000, updated 20 May 2001, 26 April 2002, 6 July 2002, 8 January 2003, 16 July 2005, 13 February 2007.
Copyright © IBM Corporation and Mike Cowlishaw, 2000, 2007. All rights reserved.