A Summary of Chen-Ho Decimal Data encoding


This document summarizes the decimal data encoding presented in the paper:
Storage-Efficient Representation of Decimal Data, by Tien Chi Chen & Irving T. Ho, in CACM (18)1, pp.49-52, January 1975.
Abstract: Usually n decimal digits are represented by 4n bits in computers. Actually, two BCD digits can be compressed optimally and reversibly into 7 bits, and three digits into 10 bits, by a very simple algorithm based on the fixed-length combination of two variable field-length encodings. In over half of the cases the compressed code results from the conventional BCD code by simple removal of redundant 0 bits. A long decimal message can be subdivided into three-digit blocks, and separately compressed; the result differs from the asymptotic minimum length by only 0.34 percent. The hardware requirement is small, and the mappings can be done manually.

A refinement to Chen-Ho encoding called Densely Packed Decimal was invented in 2002, and that is the decimal encoding used in the IEEE 754 standard (‘754-2008’) and the ISO/IEC/IEEE 60559:2011 international standard.

See also: Decimal Number Compression, Tien Chi Chen, Internal IBM memo to Dr. Irving T. Ho, 4pp, IBM, 29 March 1971.


Chen-Ho encoding is one of many possible encodings for decimal digits; it encodes three decimal digits in 10 bits with a 0.34% wastage, giving a 20% more efficient encoding than simple BCD (one digit in 4 bits). This compression allows a 33-digit decimal number with a three-digit scale or exponent to be held in a 128-bit representation.

The specific encoding preserves much of the identity of the three decimal digits, and allows simple processing; it does not require multiplications or divisions to encode or decode to or from BCD.

In brief, the encoding considers each of the three digits as either being small (0-7, requiring 3 bits to distinguish) or large (8 or 9, requiring one bit). The top bit of each BCD digit is 0 for small values, and 1 for large values.

The possible combinations of these ranges are then:

a) All digits are small (51.2% of the possibilities).
This requires 3+3+3 bits for the digits, leaving 1 bit to indicate this combination.
b) Two digits are small (38.4%).
This requires 3+3+1 bits for the digits, leaving 3 to indicate this combination.
c) One digit is small (9.6%).
This requires 3+1+1 bits for the digits, leaving 5 to indicate this combination.
d) No digits are small (0.8%).
This requires 1+1+1 bits for the digits, leaving 7 (only 5 needed) this combination.

The following tables fully describe the encoding (compression) and decoding (expansion to BCD); a-l represent the 12 bits of three BCD digits, and p-y represent the 10 bits of the encoded digits.

Compression: (abcd)(efgh)(ijkl) becomes (p)(qrs)(tuv)(wxy)
aei p qrs tuv wxy
000
100
010
001
011
101
110
111
0 bcd fgh jkl
1 00d fgh jkl
1 01d bch jkl
1 10d fgh bcl
1 11d 00h bcl
1 11d 01h fgl
1 11d 10h jkl
1 11d 11h 00l

Expansion: (p)(qrs)(tuv)(wxy) becomes (abcd)(efgh)(ijkl)
pqrtu abcd efgh ijkl
0....
100..
101..
110..
11100
11101
11110
11111
0qrs 0tuv 0wxy
100s 0tuv 0wxy
0tus 100v 0wxy
0wxs 0tuv 100y
0wxs 100v 100y
100s 0wxv 100y
100s 100v 0wxy
100s 100v 100y
('.' means don't-care)
In hardware, simple boolean operations on input bits define each output bit (for example, during compression: s=d, and p=a+e+i). Full details are in the referenced paper, and also appear as the boolean expressions in the sample code below.

For software, table lookup is a simple and efficient solution. Below is a NetRexx program which can be directly modified for generating table constants (or ported to a different language).

  /* chenho.nrx -- Sample conversion methods for Chen-Ho encoding */
  /* mfc 2000.06.15 */

  loop i=0 to 999
    bcd=i.right(3, 0)      -- make three-digit hexadecimal string
    bit10=bcd2ch(bcd.x2b)  -- compress
    bit12=ch2bcd(bit10)    -- expand
    say bcd bit10 bit12    -- display
  end i

  exit
  /* bcd2ch -- Compress BCD to Chen-Ho
     argument is a string of 12 characters, each 0 or 1
              (for example, 923 is 100100100011)
     result   is a string of 10 characters, each 0 or 1
              (for the example, this would be 1001010011)
     */
  method bcd2ch(bcd) static
    -- assign each bit to a variable, named as in paper
    parse bcd a +1 b +1 c +1 d +1 e +1 f +1 g +1 h +1 i +1 j +1 k +1 l +1

    -- derive the result bits, using the boolean expressions in Table III(a).
    -- [the operators are: '&'=AND, '|'=OR, '\'=NOT.]
    p=a | e | i
    q=(b & \e) | i | (a & e)
    r=(c & \i) | e | (a & i)
    s=d
    t=(a & e) | (f & (\a | \i)) | (b & e & \i)
    u=(a & i) | (c & e & \i) | (g & \e)
    v=h
    w=j | (b & i) | (f & a & i)
    x=k | (c & i) | (g & a & i)
    y=l

    -- concatenate the bits and return
    return p||q||r||s||t||u||v||w||x||y
  /* ch2bcd -- Expand Chen-Ho to BCD
     argument is a string of 10 characters, each 0 or 1
              (for example, 1001010011)
     result   is a string of 12 characters, each 0 or 1
              (for the example, 100100100011 -- 923)
     */
  method ch2bcd(ch) static
    -- assign each bit to a variable, named as in paper
    parse ch p +1 q +1 r +1 s +1 t +1 u +1 v +1 w +1 x +1 y +1

    -- derive the result bits, using the boolean expressions in Table III(b).
    a=(p & \q & \r) | (p & q & r & (t | u))
    b=(q & \p) | (t & p & \q & r) | (w & q & (\r | (\t & \u)))
    c=(r & (\p | (\q & u))) |   (x & p & q & (\r | (\t & \u)))
    d=s
    e=(p & r & (\q | \u | t))
    f=(t & (\p | \r)) | (p & q & r & \t & u & w)
    g=(u & (\p | \r)) | (p & q & r & \t & u & x)
    h=v
    i=(p & q & (\r | \t | u))
    j=(w & (\p | \q | (r & t)))
    k=(x & (\p | \q | (r & t)))
    l=y

    -- concatenate the bits and return
    return a||b||c||d||e||f||g||h||i||j||k||l


Summary by Mike Cowlishaw, June 2000.
Copyright © Mike Cowlishaw, 2000, 2014. All rights reserved.
Parts Copyright © IBM Corporation, 2000. All rights reserved.