Floating point operations

The DFPAL Library, version 2.20 © Copyright IBM Corporation, 2007. All rights reserved.
[previous \| contents \| next]	[printer friendly]

Floating point operations

DFPAL supports 64-bit (precision=16 digits), and 128-bit (precision=34 digits) encoding formats. DFPAL also supports 32-bit (precision=7 digits) conversion routines to/from 64-bit encoding format. An application can emulate 32-bit encoding format arithmetic using the conversion routines coupled with 64-bit encoding format arithmetic. These encodings are referred as decimal32, decimal64, and decimal128. decimalNN is a general term referring to either decimal64 or decimal128 encoding formats.

Decimal floating point operations are classified into the following categories.

Data type conversion
Arithmetic operation
Miscellaneous: Field access, utility, data class

Programming Note
This guide refers to decimalNN...() routines. However, it is highly recommended that application program to decNN...() macros for easy migration to compiler native decimal floating point using the DFPAL_USE_COMPILER_DFP compile time switch. Refer to How to compile DFPAL? for more information. The decNN...() macros are listed in square brackets.

Not all the facilities listed below is a part of the current Extension for the programming language C to support decimal floating-point arithmetic working draft technical report by the ISO/IEC JTC1 SC22 WG14 committee. The proposal for extension is continuously evolving. Refer to the current draft of the proposal for extension. Adhering to the facilities listed in the proposal for extension may simplify migration to compiler native decimal floating point usage later.

Data type conversion

DFPAL provides conversion to and from decimal floating point format and many other programming language intrinsic data types such as integer and floating point formats.

char * dfpal_decimalNNToString(const decimalNN rhs, char *out) [decNNToString]

Convert input rhs from decimalNN to equivalent string representation, populate memory location pointed by the out parameter. Returns same memory location as out. Memory must be pre-allocated by application.

decimalNN dfpal_decimalNNFromString(const char *dfpstr) [decNNFromString]

Convert input dfpstr into decimalNN number. Exceptions are raised as discussed in General Decimal Arithmetic Specification.

double decimalNNToDouble(const decimalNN rhs) [decNNToDouble]

Convert input decimalNN value rhs to binary floating point double representation. There will be lost precision and/or slight inaccuracy converting between these representations. Input sNaN is converted to NaN, because double value of sNaN is not portable.

decimalNN decimalNNFromDouble(const double) [decNNFromDouble]

Convert binary floating point input double value to equivalent decimalNN format.

uint8_t * decimalNNToPackedBCD(decimalNN rhs, uint8_t bcdOut, int32_t bcdOutLen, int32_t scale) [decNNToPackedBCD]

Convert rhs in packed binary coded decimal (BCD) string. The bcdOutLen is length of the input array bcdOut. Then length must be at least 9 bytes (for decimal64) or 18 bytes (for decimal128), otherwise unchanged bcdOut is returned. The scale is output – scale of the rhs. The output array will be populated in right aligned order. That is, highest array index will hold least significant digit nibble and a sign nibble. Application must pre-allocated enough memory for the bcdOut, and clear the array content if necessary. Output sign is C (for +ve number) or D (for –ve number).

decimalNN decimalNNFromPackedBCD(uint8_t *bcdIn, int32_t bcdInLen, int32_t scale) [decNNFromPackedBCD]

Convert the input character array bcdIn into decimalNN number, return decimalNN. The bcdInLen is length of input array bcdIn. The length must be at least 9 bytes (for decimal64) or 18 bytes (for decimal128), or NaN is returned. The input scale is used to set exponent of the output number. The input bcdIn is assumed to be right aligned. That is, highest array index will hold least significant nibble and a sign nibble. Input sign nibble must be present, and it is assumed to be C (for +ve number) or D (for –ve number). When the input array in larger than 9 bytes (for decimal64) or 18 bytes (for decimal128), no more than 16 digits (for decimal64) or 34 digits (for decimal128) shall be present. Application can pre-process input array before passing it to the decimalNNFromPackedBCD() to ensure that above rules are observed. This rules are imposed to reconcile rounding effect between hardware and software implementations of the decimal floating point.

decimal32 decimal64ToDecimal32(const decimal64 rhs) [dec64ToDecimal32]

Convert decimal64 input rhs into decimal32 format. In case of lost precision INEXACT exception is raised. Note: this function is intended to be used for decimal32 arithmetic emulation. The INVALID exception is not raised on sNaN input to avoid any side effect.

decimal64 decimal64FromDecimal32(const decimal32 rhs) [dec64FromDecimal32]

Convert decimal32 input rhs into decimal64 format. No exceptions are possible. Note: this function is intended to be used for decimal32 arithmetic emulation. The INVALID exception is not raised on sNaN input to avoid any side effect.

General integer conversion rules
For conversion to integer, no INEXACT exception is raised even if there are lost digit(s). Conversion to integer is not performed with current rounding mode, rounding mode is always truncate. (Conversion to integer with desired rounding mode can be performed by using decimalNNToIntegralValue() function followed by either decimalNNToIntXX() or decimalNNToUintXX() conversion.).

For out of bound or Infinite input that cannot be represented in the destination format, an integer with largest magnitude in the direction of the sign with the same sign as source (input) is returned with INVALID exception raised. For example, converting decimalNN to 64-bit signed integer, an input of +Infinite (or say 1E+100) will return 9223372036854775807, -Infinite (or say -1E+100) will return -9223372036854775808, The INVALID exception is raised in either case.

For +/-sNaN or +/-NaN input smallest possible integer is returned, regardless of the source (input) sign. For example, converting decimalNN value +NaN to 64-bit signed integer, return value is -9223372036854775808 regardless of the input sign. The INVALID exception raised.

Negative source (input) is considered out of bound value for unsigned integer conversion, largest magnitude in the direction of the –ve sign is 0 for the unsigned integer. Similarly, smallest possible integer, which is 0 is returned for +/-NaN or +/-sNaN input with INVALID exception raised.

In case of conversion from integer to decimalNN, current rounding mode is used and INEXACT exception may be raised, especially when converting 64-bit integer to decimal64. Because maximum precision for decimal64 is 16 digits and integer may need up to 20 digits for exact representation. No other exceptions are possible.

int64_t decimalNNToInt64(const decimal64 rhs) [decNNToInt64]

Convert input rhs to equivalent 64-bit signed integer value. The general integer conversion rules apply. The output 64-bit signed integer is in range [-9223372036854775808, 9223372036854775807].

uint64_t decimalNNToUint64(const decimal64 rhs) [decNNToUint64]

Convert input rhs to equivalent 64-bit unsigned integer value. The general integer conversion rules apply. The output 64-bit unsigned integer is in range [0, 18446744073709551615].

int32_t decimalNNToInt32(const decimal64 rhs) [decNNToInt32]

Convert input rhs to equivalent 32-bit signed integer value. The general integer conversion rules apply. The output 32-bit signed integer is in range [-2147483648, 2147483647].

uint32_t decimalNNToUint32(const decimal64 rhs) [decNNToUint32]

Convert input rhs to equivalent 32-bit unsigned integer value. The general integer conversion rules apply. The output 32-bit unsigned integer is in range [0, 4294967295].

decimalNN decimalNNFromInt64(const int64_t rhs) [decNNFromInt64]

Convert 64-bit signed integer input rhs to equivalent decimalNN format. The general integer conversion rules apply.

decimalNN decimalNNFromUint64(const uint64_t rhs) [decNNFromUint64]

Convert 64-bit unsigned integer input rhs to equivalent decimalNN format. The general integer conversion rules apply.

decimalNN decimalNNFromInt32(const int32_t rhs) [decNNFromInt32]

Convert 32-bit signed integer input rhs to equivalent decimalNN format. The general integer conversion rules apply.

decimalNN decimalNNFromUint32(const uint32_t rhs) [decNNFromUint32]

Convert 32-bit unsigned integer input rhs to equivalent decimalNN format. The general integer conversion rules apply.

Utility functions

uint32_t decNNSign(const decimalNN rhs) [decNNsign]

Return sign of input rhs. Returns integer non-zero value (boolean 1) if rhs is –ve, 0 otherwise.

uint32_t decNNComb(const decimalNN rhs) [decNNComb]

Return combination field of rhs.

uint32_t decNNExpCon(const decimalNN rhs) [decNNExpCon]

Return exponent continuation filed of rhs.

uint8_t decimalNNIsInfinite(const decimalNN rhs) [decNNIsInfinite]

Returns boolean value 1 if rhs is +/-Infinite, 0 otherwise.

uint8_t decimalNNIsNaN (const decimalNN rhs) [decNNIsNaN]

Returns boolean value 1 if rhs is either +/-qNaN or +/-sNaN, 0 otherwise.

uint8_t decimalNNIsQNaN (const decimalNN rhs) [decNNIsQNaN]

Returns boolean value 1 if rhs is +/-qNaN, 0 otherwise.

uint8_t decimalNNIsSNaN (const decimalNN rhs) [decNNIsSNaN]

Returns boolean value 1 if rhs is +/-sNaN, 0 otherwise.

uint8_t decimalNNIsNegative (const decimalNN rhs) [decNNIsNegative]

Returns boolean value 1 if rhs is –ve number, 0 otherwise. This function is similar to the decNNSign().

uint8_t decimalNNIsZero (const decimalNN rhs) [decNNIsZero]

Returns boolean value 1 if rhs numeric value is +/-0, 0 otherwise.

decimal64 decimalNNTrim(const decimalNN rhs) [decNNTrim]

Remove insignificant trailing zeros from rhs. That is, if the number has any fractional trailing zeros they are removed by dividing the coefficient by the appropriate power of ten and adjusting the exponent accordingly.

decimal64 decimalNNZero(void) [decNNZero]

Returns decimalNN value 0 (no sign, coefficient=0, exponent=0), useful for initializing decimalNN to zero.

int32_t decimalNNGetDigits(const decimalNN rhs) [decNNGetDigits]

Return number of coefficient digits of rhs.

int32_t decimalNNGetExponent(const decimalNN rhs) [decNNGetExponent]

Return unbiased exponent of rhs.

Arithmetic operations

decimalNN decimalNNAbs(const decimalNN rhs) [decNNAbs]

The return value is absolute value of decimalNN input rhs.

decimalNN decimalNNAdd(const decimalNN lhs, const decimalNN rhs) [decNNAdd]

Add decimalNN inputs lsh and rhs, return result of the add operation.

decimalNN decimalNNCompare(const decimalNN lhs, const decimalNN rhs) [decNNCompare]

Compare decimalNN inputs lsh and rhs, return decmimalNN –1 if lsh is less than rhs, decimalNN 0 if lsh is equal to rhs, and decmimalNN +1 if lsh is greater than rhs.

int32_t decimalNNCompareLT(const decimalNN lhs, const decimalNN rhs) [decNNCompareLT]

This is a macro. Compare decimalNN inputs lsh and rhs, return boolean 1 if lsh is less than rhs. This macro is convenient, fast and it is recommended over the decimalNNCompare() for use in programming language conditional constructs such as if, for or while statement.

int32_t decimalNNCompareLE(const decimalNN lhs, const decimalNN rhs) [decNNCompareLE]

This is a macro. Compare decimalNN inputs lsh and rhs, return boolean 1 if lsh is less than or equal to rhs. This macro is convenient, fast and it is recommended over the decimalNNCompare() for use in programming language conditional constructs such as if, for or while statement.

int32_t decimalNNCompareEQ(const decimalNN lhs, const decimalNN rhs) [decNNCompareEQ]

This is a macro. Compare decimalNN inputs lsh and rhs, return boolean 1 if lsh is equal to rhs. This macro is convenient, fast and it is recommended over the decimalNNCompare() for use in programming language conditional constructs such as if, for or while statement.

int32_t decimalNNCompareNE(const decimalNN lhs, const decimalNN rhs) [decNNCompareNE]

This is a macro. Compare decimalNN inputs lsh and rhs, return boolean 1 if lsh is not equal to rhs. This macro is convenient, fast and it is recommended over the decimalNNCompare() for use in programming language conditional constructs such as if, for or while statement.

int32_t decimalNNCompareGT(const decimalNN lhs, const decimalNN rhs) [decNNCompareGT]

This is a macro. Compare decimalNN inputs lsh and rhs, return boolean 1 if lsh is greater than rhs. This macro is convenient, fast and it is recommended over the decimalNNCompare() for use in programming language conditional constructs such as if, for or while statement.

int32_t decimalNNCompareGE(const decimalNN lhs, const decimalNN rhs) [decNNCompareGE]

This is a macro. Compare decimalNN inputs lsh and rhs, return boolean 1 if lsh is greater than or equal to rhs. This macro is convenient, fast and it is recommended over the decimalNNCompare() for use in programming language conditional constructs such as if, for or while statement.

int32_t decimalNN decimalNNCompareTotal(const decimalNN lhs, const decimalNN rhs) [decNNCompareTotal]

Compare decimalNN inputs lsh and rhs, return decmimalNN –1 if lsh is less than rhs, decimalNN 0 if lsh is equal to rhs, and decmimalNN +1 if lsh is greater than rhs. Unlike, regular compare, compare Total compares numbers using abstract representation. Refer to Decimal Arithmetic Specification for more details.

int32_t decimalNNCompareTotalLT(const decimalNN lhs, const decimalNN rhs) [decNNCompareTotalLT]

This is a macro. Compare decimalNN inputs lsh and rhs using total order, return boolean 1 if lsh is less than rhs. This macro is convenient, fast and it is recommended over the decimalNNCompareTotal() for use in programming language conditional constructs such as if, for or while statement.

int32_t decimalNNCompareTotalLE(const decimalNN lhs, const decimalNN rhs) [decNNCompareTotalLE]

This is a macro. Compare decimalNN inputs lsh and rhs using total order, return boolean 1 if lsh is less than or equal to rhs. This macro is convenient, fast and it is recommended over the decimalNNCompareTotal() for use in programming language conditional constructs such as if, for or while statement.

int32_t decimalNNCompareTotalEQ(const decimalNN lhs, const decimalNN rhs) [decNNCompareTotalEQ]

This is a macro. Compare decimalNN inputs lsh and rhs using total order, return boolean 1 if lsh is equal to rhs. This macro is convenient, fast and it is recommended over the decimalNNCompareTotal() for use in programming language conditional constructs such as if, for or while statement.

int32_t decimalNNCompareTotalNE(const decimalNN lhs, const decimalNN rhs) [decNNCompareTotalNE]

This is a macro. Compare decimalNN inputs lsh and rhs using total order, return boolean 1 if lsh is not equal to rhs. This macro is convenient, fast and it is recommended over the decimalNNCompareTotal() for use in programming language conditional constructs such as if, for or while statement.

int32_t decimalNNCompareTotalGT(const decimalNN lhs, const decimalNN rhs) [decNNCompareTotalGT]

This is a macro. Compare decimalNN inputs lsh and rhs using total order, return boolean 1 if lsh is greater than rhs. This macro is convenient, fast and it is recommended over the decimalNNCompareTotal() for use in programming language conditional constructs such such as if, for or while statement.

int32_t decimalNNCompareTotalGE(const decimalNN lhs, const decimalNN rhs) [decNNCompareTotalGE]

This is a macro. Compare decimalNN inputs lsh and rhs using total order, return boolean 1 if lsh is greater than or equal to rhs. This macro is convenient, fast and it is recommended over the decimalNNCompareTotal() for use in programming language conditional constructs such such as if, for or while statement.

decimalNN decimalNNDivide(const decimalNN lhs, const decimalNN rhs) [decNNDivide]

Divide lsh with rhs, return result.

decimalNN decimalNNDivideInteger(const decimalNN lhs, const decimalNN rhs) [decNNDivideInteger]

Divide lsh with rhs, return integer part of the quotient.

decimalNN decimalNNExp(const decimalNN rhs) [decNNExp]

The return value is e raised to the power of rhs. Refer to decNumber module for more information.

decimalNN decimalNNLn(const decimalNN rhs) [decNNLn]

The return value is natural logarithm (logarithm in base e) of rhs. Refer to decNumber module for more information.

decimalNN decimalNNLog10(const decimalNN rhs) [decNNLog10]

The return value is logarithm in base ten of rhs. Refer to decNumber module for more information.

decimalNN decimalNNMax(const decimalNN lhs, const decimalNN rhs) [decNNMax]

The return value is numeric maximum of lsh and rhs. Note that regular numeric comparison operation is used.

decimalNN decimalNNMin(const decimalNN lhs, const decimalNN rhs) [decNNMin]

The return value is numeric minimum of lsh and rhs. Note that regular numeric comparison operation is used.

decimalNN decimalNNMinus(const decimalNN rhs) [decNNMinus]

The return value is negated value of rhs. This can be used as unary negate operation.

decimalNN decimalNNMultiply(const decimalNN lhs, const decimalNN rhs) [decNNMultiply]

Multiply decimalNN inputs lsh and rhs, return result of the multiply operation.

decimalNN decimalNNNormalize(const decimalNN rhs) [decNNNormalize]

Return a number which is numerically same as rhs, derived from the rhs by removing trailing zeros in the coefficient. Zeros removed by dividing the coefficient by the appropriate power of ten and adjusting the exponent accordingly.

decimalNN decimalNNPlus(const decimalNN rhs) [decNNPlus]

The return value is plus sign prefix to rhs. This function essentially no-op, provided for identity with the decimalNNMinus().

decimalNN decimalNNPower(const decimalNN lhs, const decimalNN rhs) [decNNPower]

The return value is the result of raising the lhs to the power of the rhs. Refer to decNumber module for more information.

decimalNN decimalNNPowerInt(const decimalNN lhs, const int32_t rhs) [decNNPowerInt]

Same as the decimalNNPower(), however the second argument rhs is integer. Useful for square, or cube of the lhs.

decimalNN decimalNNQuantize(const decimalNN lhs, const decimalNN rhs) [decNNQuantize]

This function is used to modify the lhs so that its exponent has a specific value, equal to that of the rhs. The decimalNNRescale() function may also be used for this purpose, but requires the exponent to be given as a decimal number.

decimalNN decimalNNRemainder(const decimalNN lhs, const decimalNN rhs) [decNNRemainder]

The return value is the remainder when lhs is divided by the rhs.

decimalNN decimalNNRemainderNear(const decimalNN lhs, const decimalNN rhs) [decNNRemainderNear]

The return value is the remainder when lhs is divided by the rhs, using the rules defined in IEEE 854. This follows the same definition as decimalNNRemainder(), except that the nearest integer (or the nearest even integer if the remainder is equidistant from two) is used for the quotient instead of the result from decimalNNDivideInteger().

For example, if lhs had the value 10 and rhs had the value 6 then the result would be -2 (instead of 4) because the nearest multiple of 6 is 12 (rather than 6).

decimalNN decimalNNRescale(const decimalNN lhs, const decimalNN rhs) [decNNRescale]

This function is used to rescale a number so that its exponent has a specific value, given by the rhs. It is similar to the decimalNNQuantize(), however, the second argument rhs specifies new exponent. The rhs must be a whole number (before any rounding); that is, any digits in the fractional part of the number must be zero. decimalNNQuantize() is faster and it is recommended over decimalNNRescale(). This function may be removed from the future release.

decimalNN decimalNNSameQuantum(const decimalNN lhs, const decimalNN rhs) [decNNSameQuantum]

The return value is decimalNN 1 if the lhs and rhs have equal exponent. Return 0, otherwise.

decimalNN decimalNNSquareRoot(const decimalNN rhs) [decNNSquareRoot]

The return value is square root of the rhs.

decimalNN decimalNNSubtract(const decimalNN lhs, const decimalNN rhs) [decNNSubtract]

Subtract decimalNN inputs lsh and rhs, return result of the subtract operation.

decimalNN decimalNNToIntegralValue(const decimalNN rhs) [decNNToIntegralValue]

The return value is rhs, with any fractional part removed if necessary using the current rounding mode. No exceptions flags, not even Inexact, are set (unless the operand is sNaN). Unlike decimalNNToInt...(), the result may have a positive exponent.

decimalNN decimalNNCeil(const decimalNN rhs) [decNNCeil]

The return value is ceiling value of the rhs. No exceptions, not even INEXACT, are raised (unless the operand is sNaN). The result may have a positive exponent.

decimalNN decimalNNFloor(const decimalNN rhs) [decNNFloor]

The return value is floor value of the rhs. No exceptions, not even INEXACT, are raised (unless the operand is sNaN). The result may have a positive exponent.

[previous | contents | next]

DFPAL is authored by Punit Shah (punit@us.ibm.com).
Please send any corrections, comments or questions to dfpal-l@austin.ibm.com.

This page was updated on 21 Dec 2007.