The decNumber Library, version 3.68
Copyright (c) IBM Corporation, 2010. All rights reserved. ©
23 Jan 2010
[previous | contents | next]

Appendix A – Library performance

The decNumber module implements arbitrary-precision arithmetic with fully tailorable parameters (rounding precision, exponent range, and other factors can all be changed at run time). All decNumber operations can accept arbitrary-length operands. Further, decNumber uses a general-purpose internal format (tunable at compile time) which therefore requires conversions to and from any external format (such as strings, BCD, or the IEEE 754 fixed-size decimal encodings).

As a result, the module has significant overheads compared to the dedicated decFloats modules which work directly on the fixed-size encodings. This appendix compares the performance of the decNumber module with the decDouble and decQuad implementations of the same operations. As the tables below show, there is a significant performance advantage in using the decFloats modules when arbitrary-precision operations are not required.


Description of the tables

In the following tables, timings for each operation are given in processor clock cycles. While generally a more useful indicator of comparative performance than ‘wall clock’ times, cycle counts vary considerably with processor architecture. For example, the times below are cycles measured on an Intel Pentium M processor in an IBM X41T Thinkpad;[1]  on a Pentium 4 or RISC processor most of the tests would show significantly higher cycle counts. The compiler used also makes a measurable difference. Details of the tests and compiler are given in the notes at the end of this appendix.

Throughout the tables, worst-case cycle times are shown for the main operations in the decDouble and decQuad modules, compared with the same operations using the decNumber module (which requires conversion of operands and results).

Worst-case timings are quoted because best-case timings are generally trivial special cases (such as NaN arguments) and ‘typical’ instruction mixes are very application-dependent.

For each operation, the name of the operation is given, along with a brief description of the worst-case form of the operation. This is the worst case for the decFloats module (in some cases the worst case is different for the decNumber module).


decDouble performance tables

decDouble (64-bit) conversions
  Operation   decDouble   decNumber
  Encoding to BCD (with exponent)
  16-digit finite
  39       481    
  BCD to encoding (with exponent)
  16-digit finite
  46       327    
  Encoding to string
  16-digit, with exponent
  84       133    
  Exact string to encoding (unrounded)
  16-digit, with exponent
  229       196    
  String to encoding (rounded)
  16-digit, rounded, with exponent
  266       548    
  Widen to decQuad
  16-digit, with exponent
  30       209    
  int32 to encoding
  From most negative int
  39       199    
  Encoded integer to int32
  To most negative int32
  32       136    
  decDouble (64-bit) miscellaneous operations
  Operation   decDouble   decNumber
  Class (classify datum)
  Negative small subnormal
  37       113    
  Copies (Abs/Negate/Sign)
  CopySign, copy needed
  25       338    
  Count significant digits
  Single digit
  24       122    
  Logical And/Or/Xor/Invert (digitwise)
  16-digit
  23       510    
  Shift/Rotate
  Rotate 15 digits
  154       583    
  decDouble (64-bit) computations
  Operation   decDouble   decNumber
  Add (same-sign addition)
  16-digit, unaligned, rounded
  248       848    
  Subtract (different-signs addition)
  16-digit, unaligned, rounded, borrow
  288    
  Compare
  16-digit, unaligned, mismatch at end
  126       442    
  CompareTotal
  16-digit, unaligned, mismatch at end
  149       594    
  Divide
  16- by 16-digit (rounded)
  828       1576    
  FMA (fused multiply-add)
  16-digit, subtraction, rounded
  785       1683    
  LogB (returns a decDouble)
  Negative result
  48       279    
  MaxNum/MinNum
  16-digit, unaligned, mismatch at end
  155       656    
  Multiply
  16×16-digit, round needed
  362       1305    
  Quantize
  16-digit, round all-nines
  112       422    
  ScaleB (from decDoubles)
  Underflow
  212       513    
  To integral value
  16-digit, round all-nines
  135       709    


decQuad performance tables

  decQuad (128-bit) conversions
  Operation   decQuad   decNumber
  Encoding to BCD (with exponent)
  34-digit finite
  53       460    
  BCD to encoding (with exponent)
  34-digit finite
  74       307    
  Encoding to string
  34-digit, with exponent
  183       239    
  Exact string to encoding (unrounded)
  34-digit, with exponent
  297       597    
  String to encoding (rounded)
  34-digit, rounded, with exponent
  451       956    
  Narrow to decDouble
  34-digit, all nines
  140       612    
  int32 to encoding
  From most negative int
  44       199    
  Encoded integer to int32
  To most negative int32
  32       156    
  decQuad (128-bit) miscellaneous operations
  Operation   decQuad   decNumber
  Class (classify number)
  Negative small subnormal
  53       133    
  Copies (Abs/Negate/Sign)
  CopySign, copy needed
  27       380    
  Count significant digits
  Single digit
  27       138    
  Logical And/Or/Xor/Invert (digitwise)
  34-digit
  27       622    
  Shift/Rotate
  Rotate 33 digits
  222       812    
  decQuad (128-bit) computations
  Operation   decQuad   decNumber
  Add (same-sign addition)
  34-digit, aligned
  433       1180    
  Subtract (different-signs addition)
  34-digit, unaligned, rounded, borrow
  457    
  Compare
  34-digit, unaligned, mismatch at end
  187       1125    
  CompareTotal
  34-digit, unaligned, mismatch at end
  238       778    
  Divide
  34- by 34-digit (rounded)
  2018       3172    
  FMA (fused multiply-add)
  34-digit, subtraction, rounded
  1622       2707    
  LogB (returns a decQuad)
  Negative result
  58       299    
  MaxNum/MinNum
  34-digit, unaligned, mismatch at end
  241       857    
  Multiply
  34×34-digit, round needed
  821       2235    
  Quantize
  34-digit, round all-nines
  209       670    
  ScaleB (from decQuads)
  Underflow
  263       553    
  To integral value
  34-digit, round all-nines
  233       886    


Notes

The following notes apply to all the tables in this appendix.
  1. All timings were made on an IBM X41T Tablet PC (Pentium M, 1.5GHz, 1.5GB RAM) under Windows XP Tablet Edition with SP2; the modules were compiled using GCC version 3.4.4 with optimization settings -O3 -march=i686.
  2. The default tuning parameters were used (DECUSE64=1, DECDPUN=3, etc.); some of these only affect decNumber.
  3. Timings include call/return overhead, and for the decNumber module also include the costs of converting operand(s) to decNumbers and results back to the appropriate format using the decimal64 or decimal128 module.
  4. ‘BCD’ for decNumber is Packed BCD, using the decPacked module; for decFloats it is 8-bit BCD.
  5. The worst case for each operation is not always obvious from the code and is implementation-dependent (for example, in the decFloats modules, an unaligned add is sometimes faster than an aligned add). It is possible that there may be unusual cases which are slower than the decFloats counts listed above, although a wide variety of micro-benchmarks have been tried.
  6. A string-to-number conversion can theoretically have an arbitrarily large worst case as the string could contain any number of leading, trailing, or embedded zeros; the timings above measured cases where the input string’s coefficient had up to eight more digits than the precision of the destination format.

Footnotes:
[1] ‘Intel’ and ‘Pentium’ are trade marks of the Intel Corporation. ‘Thinkpad’ is a trade mark of Lenovo.

[previous | contents | next]