## A Summary of Chen-Ho Decimal Data encoding

This document summarizes the decimal data encoding presented in the paper:
 Storage-Efficient Representation of Decimal Data, by Tien Chi Chen & Irving T. Ho, in CACM (18)1, pp.49-52, January 1975. Abstract: Usually n decimal digits are represented by 4n bits in computers. Actually, two BCD digits can be compressed optimally and reversibly into 7 bits, and three digits into 10 bits, by a very simple algorithm based on the fixed-length combination of two variable field-length encodings. In over half of the cases the compressed code results from the conventional BCD code by simple removal of redundant 0 bits. A long decimal message can be subdivided into three-digit blocks, and separately compressed; the result differs from the asymptotic minimum length by only 0.34 percent. The hardware requirement is small, and the mappings can be done manually.

A refinement to Chen-Ho encoding called Densely Packed Decimal was invented in 2002, and that is the decimal encoding used in the IEEE 754 standard (‘754-2008’) and the ISO/IEC/IEEE 60559:2011 international standard.

Chen-Ho encoding is one of many possible encodings for decimal digits; it encodes three decimal digits in 10 bits with a 0.34% wastage, giving a 20% more efficient encoding than simple BCD (one digit in 4 bits). This compression allows a 33-digit decimal number with a three-digit scale or exponent to be held in a 128-bit representation.

The specific encoding preserves much of the identity of the three decimal digits, and allows simple processing; it does not require multiplications or divisions to encode or decode to or from BCD.

In brief, the encoding considers each of the three digits as either being small (0-7, requiring 3 bits to distinguish) or large (8 or 9, requiring one bit). The top bit of each BCD digit is 0 for small values, and 1 for large values.

The possible combinations of these ranges are then:

 a) All digits are small (51.2% of the possibilities). This requires 3+3+3 bits for the digits, leaving 1 bit to indicate this combination. b) Two digits are small (38.4%). This requires 3+3+1 bits for the digits, leaving 3 to indicate this combination. c) One digit is small (9.6%). This requires 3+1+1 bits for the digits, leaving 5 to indicate this combination. d) No digits are small (0.8%). This requires 1+1+1 bits for the digits, leaving 7 (only 5 needed) this combination.

The following tables fully describe the encoding (compression) and decoding (expansion to BCD); a-l represent the 12 bits of three BCD digits, and p-y represent the 10 bits of the encoded digits.

Compression: (abcd)(efgh)(ijkl) becomes (p)(qrs)(tuv)(wxy)
 aei p qrs tuv wxy 000 100 010 001 011 101 110 111 0 bcd fgh jkl 1 00d fgh jkl 1 01d bch jkl 1 10d fgh bcl 1 11d 00h bcl 1 11d 01h fgl 1 11d 10h jkl 1 11d 11h 00l

Expansion: (p)(qrs)(tuv)(wxy) becomes (abcd)(efgh)(ijkl)
 pqrtu abcd efgh ijkl 0.... 100.. 101.. 110.. 11100 11101 11110 11111 0qrs 0tuv 0wxy 100s 0tuv 0wxy 0tus 100v 0wxy 0wxs 0tuv 100y 0wxs 100v 100y 100s 0wxv 100y 100s 100v 0wxy 100s 100v 100y
('.' means don't-care)
In hardware, simple boolean operations on input bits define each output bit (for example, during compression: s=d, and p=a+e+i). Full details are in the referenced paper, and also appear as the boolean expressions in the sample code below.

For software, table lookup is a simple and efficient solution. Below is a NetRexx program which can be directly modified for generating table constants (or ported to a different language).

 ``` /* chenho.nrx -- Sample conversion methods for Chen-Ho encoding */ /* mfc 2000.06.15 */ loop i=0 to 999 bcd=i.right(3, 0) -- make three-digit hexadecimal string bit10=bcd2ch(bcd.x2b) -- compress bit12=ch2bcd(bit10) -- expand say bcd bit10 bit12 -- display end i exit ```
 ``` /* bcd2ch -- Compress BCD to Chen-Ho argument is a string of 12 characters, each 0 or 1 (for example, 923 is 100100100011) result is a string of 10 characters, each 0 or 1 (for the example, this would be 1001010011) */ method bcd2ch(bcd) static -- assign each bit to a variable, named as in paper parse bcd a +1 b +1 c +1 d +1 e +1 f +1 g +1 h +1 i +1 j +1 k +1 l +1 -- derive the result bits, using the boolean expressions in Table III(a). -- [the operators are: '&'=AND, '|'=OR, '\'=NOT.] p=a | e | i q=(b & \e) | i | (a & e) r=(c & \i) | e | (a & i) s=d t=(a & e) | (f & (\a | \i)) | (b & e & \i) u=(a & i) | (c & e & \i) | (g & \e) v=h w=j | (b & i) | (f & a & i) x=k | (c & i) | (g & a & i) y=l -- concatenate the bits and return return p||q||r||s||t||u||v||w||x||y ```
 ``` /* ch2bcd -- Expand Chen-Ho to BCD argument is a string of 10 characters, each 0 or 1 (for example, 1001010011) result is a string of 12 characters, each 0 or 1 (for the example, 100100100011 -- 923) */ method ch2bcd(ch) static -- assign each bit to a variable, named as in paper parse ch p +1 q +1 r +1 s +1 t +1 u +1 v +1 w +1 x +1 y +1 -- derive the result bits, using the boolean expressions in Table III(b). a=(p & \q & \r) | (p & q & r & (t | u)) b=(q & \p) | (t & p & \q & r) | (w & q & (\r | (\t & \u))) c=(r & (\p | (\q & u))) | (x & p & q & (\r | (\t & \u))) d=s e=(p & r & (\q | \u | t)) f=(t & (\p | \r)) | (p & q & r & \t & u & w) g=(u & (\p | \r)) | (p & q & r & \t & u & x) h=v i=(p & q & (\r | \t | u)) j=(w & (\p | \q | (r & t))) k=(x & (\p | \q | (r & t))) l=y -- concatenate the bits and return return a||b||c||d||e||f||g||h||i||j||k||l ```

Summary by Mike Cowlishaw, June 2000.