Decimal Performance, version 1.12
Copyright (c) IBM Corporation, 2009. All rights reserved. © | Draft of 7 Apr 2009 |
[contents | next] |
IEEE 754 specifies two variants of the encoding for decimal data; one with a decimal significand and the other with a binary significand. Each of the libraries measured supports one of these encodings (in various ways), and the performance measurements here use the encoding best suited to each library.
Comments on this document are welcome. Please send any comments, suggestions, and corrections to the author, Mike Cowlishaw (mfc@speleotrove.com).
The decNumber module is part of the IBM decNumber package;[3] it implements arbitrary-precision arithmetic with fully tailorable parameters (rounding precision, exponent range, and other factors can all be changed at run time). All decNumber operations always accept arbitrary-length operands. The decNumber module uses a general-purpose internal format (tunable at compile time) which requires conversions to and from any external format. When working with 754r encodings all parameters and results require conversions (each about 100 cycles).
The decFloats modules are also part of the decNumber package; they work directly on the fixed-size encodings with decimal significand. This document gives results for the decDouble and decQuad modules (64-bit and 128-bit formats).
The Intel[4] Decimal Floating-Point Library (IDFPL) is an Intel Software Development Product.[5] The functions in the library work directly on the fixed-size encodings with binary significand (64-bit and 128-bit formats).
All three implementations are open source and written in C.
The decNumber and decFloats implementations require 32-bit binary integer types only, conform to strict aliasing and alignment rules, and are tested for use on both little-endian and big-endian architectures. They support string conversions for both ASCII/UTF8 and EBCDIC, BCD conversions, and decimal integer operations (integer divide, shift, rotate, logical and, or, xor, etc.).
The IDFPL implementation requires 64-bit binary integer and floating-point types, and is assumed to be little-endian and ASCII/UTF8 only (the README files do not refer to big-endian[6] or EBCDIC support). BCD conversions and decimal integer operations are not supported by the IDFPL implementation.
For example, the times below are cycles measured on an Intel Pentium M processor in an IBM X41T Thinkpad[7] – on a Pentium 4 or RISC processor most of the tests would show significantly higher cycle counts. The compiler used also makes a measurable difference, so all the cycle counts were measured using the same hardware, compiler, and compiler options (detailed in the notes in the next section).
In the tables, worst-case cycle times are shown for each operation for the decFloats modules (in the column headed decDouble or decQuad), the IDFPL library (headed idfpl64 or idfpl128), and the decNumber module (headed decNum).
Worst-case timings are quoted because best-case timings are generally trivial special cases (such as NaN arguments) and ‘typical’ instruction mixes are too application-dependent to be generally applicable.
For each operation, the name of the operation is given, along with a brief description of the worst-case form of the operation. This is the worst case for the decFloats modules (in some cases the worst case is different for the other modules).
[1] | See http://speleotrove.com/decimal/decarith.html |
[2] | IEEE Std 754-1985 – IEEE Standard for Floating-Point Arithmetic, The Institute of Electrical and Electronics Engineers, Inc., New York, 1985. |
[3] | See http://speleotrove.com/decimal/#decNumber |
[4] | ‘Intel’ is a trade mark of the Intel Corporation. |
[5] | See http://www.intel.com/cd/software/products/asmo-na/eng/219861.htm |
[6] | In version 1.0 there are said to be references in the code to ENDIAN values, so some support may be present. |
[7] | ‘Pentium’ is a trade mark of the Intel Corporation. ‘Thinkpad’ is a trade mark of Lenovo. |
[8] | The most recent decFloats modules support Packed BCD directly, however these conversions have not yet been benchmarked. |