decNumber - decFloats modules

The decNumber Library, version 3.68 Copyright (c) IBM Corporation, 2010. All rights reserved. ©	23 Jan 2010
[previous \| contents \| next]

decFloats modules

The decFloats modules are decSingle, decDouble, and decQuad. These are based on the 32-bit, 64-bit, and 128-bit decimal types in the IEEE 754 Standard for Floating Point Arithmetic.

In contrast to the arbitrary-precision decNumber module, these modules work directly from the decimal-encoded formats designed by the IEEE 754 committee, which are also now implemented in IBM System z (z9 and z10) and IBM System p (POWER6) processors, and in SAP NetWeaver 7.1.^[1]

Conversions to and from the decNumber internal format are not needed (typically the numbers are represented internally in ‘unpacked’ BCD or in a base of some other power of ten), and no memory allocation is necessary, so these modules are much faster than using decNumber for implementing the types.

Like the decNumber module, the decFloats modules

need only 32-bit integer support; 64-bit integers are not required and binary floating-point is not used
support both big-endian and little-endian encodings and platforms
support both ASCII/UTF8 and EBCDIC strings
are reentrant and use only aligned integers and strict aliasing
use only ANSI C.

The modules should therefore be usable on any platform with an ANSI C compiler that supports 32-bit integers.

The decFloats modules define the data structures and a large set of functions for working directly with the same compressed formats as decimal32, decimal64, and decimal128. The names are different to allow them to be used stand-alone or with the decNumber module, as illustrated in Examples 7 and 8 in the User’s Guide.

These three modules all share many of the same functions (working on the different sizes of the formats). The decQuad module has all the same functions as decDouble except for two functions which would convert to or from a wider format. The decSingle module is a limited (‘storage’) format which has a only a few conversion and miscellaneous functions; it is intended that any computation be carried out in a wider format.

The remainder of this section therefore describes only the decDouble format – in the list of functions, assume that there is a corresponding decQuad format function unless stated and assume there is not a corresponding decSingle format function unless stated.

In this implementation each format is represented as an array of unsigned bytes. There is therefore just one field in the decDouble structure:

bytes: The bytes field represents the eight bytes of a decDouble number, using Densely Packed Decimal encoding for the coefficient. As of decNumber 3.56 the structure has been changed to a union of the bytes array with arrays of larger integers; see the header file for each type for details.^[2]

The storage of a number in the bytes array is assumed to follow the byte ordering (‘endianness’) of the computing platform (if big-endian, then bytes[0] contains the sign bit of the format). The code in these modules requires that the DECLITEND tuning parameter be set to match the endianness of the platform.

The decSingle and decDouble modules both require that the next wider format be included in a program compilation (so that conversion to and from that wider format can be effected), hence the decQuad module is always needed.^[3] It, therefore, contains the constant lookup tables from the the decDPD.h header file which are shared by all three modules. These tables are automatically generated and should not need altering.

Most of the code for these modules is included from the shared source files decCommon.c and decBasic.c. The former contains the functions available in all three modules ^[4] and the latter the functions available only in decDouble and decQuad.

Definitions

The decDouble.h header file defines the decDouble data structure described above. It includes the decContext.h and and decQuad.h header files, which are both required for use.^[5] If more than one of the three decFloats formats are used in a program, it is only necessary to include the smaller or smallest.

The decDouble.h header file also contains:

Constants defining aspects of decDouble numbers, including the maximum precision, the minimum and maximum (adjusted) exponent supported, the bias applied to the exponent, the length of the number in bytes, and the maximum number of characters in the string form of the number (including terminator)
Definitions of the public functions in the decDouble module
Macros defining conversions to and from the decNumber format. These are macros in order to avoid a compile-time dependency on the decNumber module; they use decimal64 as a proxy, and their usage is shown in Example 8 in the User’s Guide.

Functions

The decDouble.c source file contains the public functions defined in the header file. These comprise conversions to and from strings and other formats, arithmetic and logical operations, and utilities.

The functions are described briefly, below. More details of the operation of each function can be found in the description of the corresponding function in the decNumber module and details of the underlying model and arithmetic can be found in the General Decimal Arithmetic Specification.^[6]

In the descriptions below, many parameters are defined as one of the following:

x, y, z: (const decDouble *) decimal input arguments to a function
r: (decDouble *) a decimal result argument to a function (which may be the same as an input argument); unless stated otherwise this is also the return value from the function, and the result will be canonical
set: (decContext *) the context for a function. Only two fields of the context structure are used: round (the rounding mode) and status (the bits in which are used to indicate any error, etc.).

Note that the trap field in the context is not used; the decDouble functions do not check for traps after every operation to avoid the overhead that would incur. The decContextSetStatus function can be used to explicitly test status to trap.

Note also that the only informational flag set by decNumber is DEC_Inexact; the others are never set by the decFloats module in order to improve performance and also to avoid the need for passing a context to many functions.^[7]

In the following list, every function has corresponding decQuad format function (for example, decQuadAbs(r, x, set)) unless stated, and does not have a corresponding decSingle format function unless stated.

decDoubleAbs(r, x, set)

Returns the absolute value of x. This has the same effect as decDoublePlus unless x is negative, in which case it has the same effect as decDoubleMinus. The effect is also the same as decDoubleCopyAbs except that NaNs are handled normally (the sign of a NaN is not affected, and an sNaN will set DEC_Invalid_operation) and the result will be canonical.

decDoubleAdd(r, x, y, set)

Adds x and y and places the result in r.

decDoubleAnd(r, x, y, set)

Carries out the digit-wise logical And of x and y and places the result in r.

The operands must be zero or positive (sign=0), an integer (finite with exponent=0) and comprise only zeros and/or ones; if not, DEC_Invalid_operation is set.

decDoubleCanonical(r, x)

This copies x to r, ensuring that the encoding of r is canonical.

decDoubleClass(x)

This returns the class (enum decClass) of the argument x.

decDoubleClassString(x)

This returns a description of the class of the argument x as a string (const char *).

decDoubleCompare(r, x, y, set)

Compares x and y numerically and places the result in r.

The result may be –1, 0, 1, or NaN (unordered); –1 indicates that x is less than y, 0 indicates that they are numerically equal, and 1 indicates that x is greater than y. NaN is returned only if x or y is a NaN.

decDoubleCompareSignal(r, x, y, set)

The same as decDoubleCompare, except that a quiet NaN argument is treated like a signaling NaN (causes DEC_Invalid_operation to be set).

decDoubleCompareTotal(r, x, y)

Compares x and y using the IEEE 754 total ordering (which takes into account the exponent) and places the result in r. No status is set (a signaling NaN is ordered between Infinity and NaN). The result will be –1, 0, or 1.

decDoubleCompareTotalMag(r, x, y)

The same as decDoubleCompareTotal except that the absolute values of the two arguments are used (as though modified by decDoubleCopyAbs).

decDoubleCopy(r, x)

Copies x to r quietly (no status is set). This is a bit-wise operation and so the result might not be canonical.

decDoubleCopyAbs(r, x)

Copies x to r quietly and sets the sign of r to 0 (no status is set). This is a bit-wise operation and so the result might not be canonical.

decDoubleCopyNegate(r, x)

Copies x to r quietly and inverts the sign of r (no status is set). This is a bit-wise operation and so the result might not be canonical.

decDoubleCopySign(r, x, y)

Copies x to r quietly with the sign of r set to the sign of y (no status is set). This is a bit-wise operation and so the result might not be canonical.

decDoubleDigits(x)

Returns the number of significant digits in x as an unsigned 32-bit integer (uint32_t). If x is a zero or is infinite, 1 is returned. If x is a NaN then the number of digits in the payload is returned.

decDoubleDivide(r, x, y, set)

Divides x by y and places the result in r.

decDoubleDivideInteger(r, x, y, set)

Divides x by y and places the integer part of the result (rounded towards zero) in r with exponent=0. If the result would not fit in r (because it would have more than DECDOUBLE_Pmax digits) then DEC_Division_impossible is set.

decDoubleFMA(r, x, y, z, set)

Calculates the fused multiply-add x × y + z and places the result in r. The multiply is carried out first and is exact, so this operation has only the one, final, rounding.

decDoubleFromBCD(r, exp, bcd, sign)

Sets r from an exponent exp (which may indicate a NaN or infinity), a BCD array bcd, and a sign.

exp (int32_t) is an in-range unbiased exponent or a special value in the form returned by decDoubleGetExponent (listed in decQuad.h).

bcd (const uint8_t *) is an array of DECDOUBLE_Pmax elements, one digit in each byte (BCD8 encoding); the first (most significant) digit is ignored if the result will be a NaN; all are ignored if the result is infinite. All bytes must be in the range 0–9.

sign (int32_t) is an integer which must be DECFLOAT_Sign to set the sign bit of r to 1, or 0 to set it to 0.

For speed, the arguments are not checked; no status is set by this function. The content of r is undefined if the arguments are invalid or out of range (that is, could not be produced by decDoubleToBCD).

(This function is also available in the decSingle module.)

decDoubleFromInt32(r, i)

Sets r from the signed 32-bit integer i (int32_t). The result is exact; no error is possible.

decDoubleFromNumber(r, dn, set)

This function is implemented as a macro and sets r from a decNumber, dn, using a decimal64 as a proxy as illustrated in Example 8 in the User’s Guide.

To use this macro, the decimal64.h header file must be included (see the text following the example for more details about compilation).

(This function is also available in the decSingle module.)

decDoubleFromPacked(r, exp, pack)

Sets r from an exponent exp (which may indicate a special value) and a packed BCD array, pack.

exp (int32_t) is an in-range unbiased exponent or a special value in the form returned by decDoubleGetExponent (listed in decQuad.h).

pack (const uint8_t *) is an array of DECDOUBLE_Pmax packed decimal digits (one digit per four-bit nibble) followed by a sign nibble, and (for decDouble and decQuad only) prefixed with an extra pad nibble (which is ignored); the sign nibble must be any of the six sign codes listed in decQuad.h and described for the decPacked module, and digit nibbles must be in the range 0–9.

Like the decDoubleFromBCD function, the first nibble of pack (after the pad nibble, if any) is ignored if the result will be a NaN, and all are ignored if the result is infinite.

(This function is also available in the decSingle module.)

decDoubleFromPackedChecked(r, exp, pack)

Sets r from an exponent exp (which may indicate a special value) and a packed BCD array, pack, with the input values fully checked.

exp (int32_t) must be an in-range unbiased exponent or a special value in the form returned by decDoubleGetExponent (listed in decQuad.h).

pack (const uint8_t *) is an array of DECDOUBLE_Pmax packed decimal digits (one digit per four-bit nibble) followed by a sign nibble, and (for decDouble and decQuad only) prefixed with an extra pad nibble (which must be zero); the sign nibble must be one of the six sign codes listed in decQuad.h and described for the decPacked module, and digit nibbles must be in the range 0–9.

The first nibble of pack (after the pad nibble, if any) must be zero if the result will be a NaN, and all digit nibbles must be zero if the result is infinite.

No status is set by this function. NULL is returned instead of r if an argument is invalid or out of range (that is, could not be produced by decNumberToPacked, except that all six sign codes are permitted).

(This function is also available in the decSingle module.)

decDoubleFromString(r, string, set)

Sets r from a character string, string (const char *).

The length of the coefficient and the size of the exponent are checked by this routine, so rounding will be applied if necessary, and this may set status flags (underflow, overflow) will be reported, or rounding applied, as necessary.

There is no limit to the coefficient length for finite inputs; NaN payloads must be integers with no more than DECDOUBLE_Pmax–1 digits. Exponents may have up to nine significant digits. The syntax of the string is fully checked; if it is not valid, the result will be a quiet NaN and an error flag will be set.

(This function is also available in the decSingle module.)

decDoubleFromUInt32(r, u)

Sets r from the unsigned 32-bit integer u (uint32_t). The result is exact and no error is possible.

decDoubleFromWider(r, dq, set)

Sets r from an instance, dq, of the next-wider format (const decQuad *). This narrowing function can cause rounding, overflow, etc., but not Invalid operation (sNaNs are copied and do not signal).

(This function is also available in the decSingle module, but is not available in the decQuad module.)

decDoubleGetCoefficient(x, bcd)

Extracts the coefficient of x as a BCD integer into the array bcd (uint8_t *) and returns the sign as a signed 32-bit integer (int32_t). The returned value will be DECFLOAT_Sign if x has sign=1 or otherwise will be 0.

The digits of the coefficent are written, one digit per byte, into DECDOUBLE_Pmax elements of the bcd array. If x is a NaN the first byte will be zero (the remainder will be the payload), and if it is infinite then all of bcd will be zero.

(This function is also available in the decSingle module.)

decDoubleGetExponent(x)

Returns the exponent of x as a 32-bit integer (int32_t). If x is infinite or is a NaN (a special value) the first seven bits of the decDouble are returned, padded with 25 zero bits on the right and with the most significant (sign) bit set to 0. For example, –sNaN would return 0x7e000000 (DECFLOAT_sNaN). The possible return values for infinities and NaNs are listed in decQuad.h.

(This function is also available in the decSingle module.)

decDoubleInvert(r, x, set)

Carries out the digit-wise logical inversion of x and places the result in r.

The operand must be zero or positive (sign=0), an integer (finite with exponent=0) and comprise only zeros and/or ones; if not, DEC_Invalid_operation is set.

decDoubleIsCanonical(x)

Returns an unsigned integer (uint32_t) which will be 1 if the encoding of x is canonical, or 0 otherwise.

decDoubleIsFinite(x)

Returns an unsigned integer (uint32_t) which will be 1 if x is neither infinite nor a NaN, or 0 otherwise.

decDoubleIsInfinite(x)

Returns an unsigned integer (uint32_t) which will be 1 if the encoding of x is an infinity, or 0 otherwise.

decDoubleIsInteger(x)

Returns an unsigned integer (uint32_t) which will be 1 if x is finite and has exponent=0, or 0 otherwise.

decDoubleIsLogical(x)

Returns an unsigned integer (uint32_t) which will be 1 if x is a valid argument for logical operations (that is, x is zero or positive (sign=0), an integer (finite with exponent=0) and comprises only zeros and/or ones), or 0 otherwise.

decDoubleIsNaN(x)

Returns an unsigned integer (uint32_t) which will be 1 if x is a NaN (quiet or signaling), or 0 otherwise.

decDoubleIsNegative(x)

Returns an unsigned integer (uint32_t) which will be 1 if x is less than zero and not a NaN, or 0 otherwise.

decDoubleIsNormal(x)

Returns an unsigned integer (uint32_t) which will be 1 if x is a normal number (that is, is finite, non-zero, and not subnormal), or 0 otherwise.

decDoubleIsPositive(x)

Returns an unsigned integer (uint32_t) which will be 1 if x is greater than zero and not a NaN, or 0 otherwise.

decDoubleIsSignaling(x)

Returns an unsigned integer (uint32_t) which will be 1 if x is a signaling NaN, or 0 otherwise.

decDoubleIsSignalling(x)

Returns an unsigned integer (uint32_t) which will be 1 if x is a signaling NaN, or 0 otherwise. (This is an alternative spelling of decDoubleIsSignaling.)

decDoubleIsSigned(x)

Returns an unsigned integer (uint32_t) which will be 1 if x has sign=1, or 0 otherwise. Note that zeros and NaNs may have sign=1.

decDoubleIsSubnormal(x)

Returns an unsigned integer (uint32_t) which will be 1 if x is subnormal (that is, finite, non-zero, and with magnitude less than 10^emin), or 0 otherwise.

decDoubleIsZero(x)

Returns an unsigned integer (uint32_t) which will be 1 if x is a zero, or 0 otherwise.

decDoubleLogB(r, x, set)

Returns the adjusted exponent of x, according to IEEE 754 rules. That is, the exponent returned is calculated as if the decimal point followed the first significant digit (so, for example, if x were 123 then the result would be 2).

If x is infinite, the result is +Infinity. If x is a zero, the result is –Infinity, and the DEC_Division_by_zero flag is set. If x is less than zero, the absolute value of x is used. If x=1, the result is 0. NaNs are handled (propagated) as for arithmetic operations.

decDoubleMax(r, x, y, set)

If both arguments are numeric (not NaNs) this returns the larger of x and y (compared using decDoubleCompareTotal, to give a well-defined result).

If either (but not both of) x or y is a quiet NaN then the other argument is the result; otherwise NaNs are handled as for arithmetic operations.

decDoubleMaxMag(r, x, y, set)

The same as decDoubleMax except that the absolute values of the two arguments are used (as though modified by decDoubleCopyAbs).

decDoubleMin(r, x, y, set)

If both arguments are numeric (not NaNs) this returns the smaller of x and y (compared using decDoubleCompareTotal, to give a well-defined result).

If either (but not both of) x or y is a quiet NaN then the other argument is the result; otherwise NaNs are handled as for arithmetic operations.

decDoubleMinMag(r, x, y, set)

The same as decDoubleMin except that the absolute values of the two arguments are used (as though modified by decDoubleCopyAbs).

decDoubleMinus(r, x, set)

This has the same effect as 0–x where the exponent of the zero is the same as that of x (if x is finite). The effect is also the same as decFloatCopyNegate except that NaNs are handled as for arithmetic operations (the sign of a NaN is not affected, and an sNaN will signal), the result is canonical, and a zero result has sign=0.

decDoubleMultiply(r, x, y, set)

Multiplies x by y and places the result in r.

decDoubleNextMinus(r, x, set)

Returns the ‘next’ decDouble to x in the direction of –Infinity according to IEEE 754 rules for nextDown. The only status possible is DEC_Invalid_operation (from an sNaN).

decDoubleNextPlus(r, x, set)

Returns the ‘next’ decDouble to x in the direction of +Infinity according to IEEE 754 rules for nextUp. The only status possible is DEC_Invalid_operation (from an sNaN).

decDoubleNextToward(r, x, y, set)

Returns the ‘next’ decDouble to x in the direction of y according to proposed IEEE 754 rules for nextAfter.^[8]

If x=y the result is decDoubleCopySign(r, x, y). If either operand is a NaN the result is as for arithmetic operations. Otherwise (the operands are numeric and different) the result of adding (or subtracting) an infinitesimal positive amount to x and rounding towards +Infinity (or –Infinity) is returned, depending on whether y is larger (or smaller) than x. The addition will set flags, except that if the result is normal (finite, non-zero, and not subnormal) no flags are set.

decDoubleOr(r, x, y, set)

Carries out the digit-wise logical inclusive Or of x and y and places the result in r.

The operands must be zero or positive (sign=0), an integer (finite with exponent=0) and comprise only zeros and/or ones; if not, DEC_Invalid_operation is set.

decDoublePlus(r, x, set)

This has the same effect as 0+x where the exponent of the zero is the same as that of x (if x is finite). The effect is also the same as decFloatCopy except that NaNs are handled as for arithmetic operations (the sign of a NaN is not affected, and an sNaN will signal), the result is canonical, and a zero result has sign=0.

decDoubleQuantize(r, x, y, set)

Returns x set to have the same quantum as y, if possible (that is, numerically the same value but rounded or padded if necessary to have the same exponent as y, for example to round a monetary quantity to cents). More details and an example are given with the decNumberQuantize function.

decDoubleRadix(x)

Returns an unsigned integer (uint32_t) set to the base used for arithmetic in this module (always ten).

(This function is also available in the decSingle module.)

decDoubleReduce(r, x, set)

Returns a copy of x with its coefficient reduced to its shortest possible form without changing the value of the result. This removes all possible trailing zeros from the coefficient (some may remain when the number is very close to the most positive or most negative number). Infinities and NaNs are unchanged and no status is set unless x is an sNaN. If x is a zero the result exponent is 0.

decDoubleRemainder(r, x, y, set)

Integer-divides x by y and places the remainder from the division in r. That is, if the same x and y were given to the decDoubleDivideInteger and decDoubleRemainder functions, resulting in int and rem respectively, then the identity x = (int × y) + rem holds.

Note that, as for decDoubleDivideInteger, it must be possible to express the intermediate result (int) as an integer. That is, it must have no more than DECDOUBLE_Pmax digits. If it has too many then DEC_Division_impossible is raised.

decDoubleRemainderNear(r, x, y, set)

This is the same as decDoubleRemainder except that the nearest integer (or the nearest even integer if the remainder is equidistant from two) is used for int instead of the result from decDoubleDivideInteger. Again, that integer must fit.

decDoubleRotate(r, x, y, set)

The result is a copy of x with the digits of the coefficient rotated to the left (if y is positive) or to the right (if y is negative) without adjusting the exponent or the sign of x.

y is the count of positions to rotate and must be a finite integer (with exponent=0) in the range –DECDOUBLE_Pmax through +DECDOUBLE_Pmax. NaNs are propagated as usual. If x is infinite the result is Infinity of the same sign. No status is set unless y is invalid or an operand is an sNaN.

decDoubleSameQuantum(x, y)

Returns an unsigned integer (uint32_t) which will be 1 if the operands have the same exponent or are both NaNs (quiet or signaling) or both infinite. In all other cases, 0 is returned. No error or status is possible.

decDoubleScaleB(r, x, y, set)

This calculates x × 10^y and places the result in r. y must be an integer (finite with exponent=0) in the range ±2 × (DECDOUBLE_Pmax + DECDOUBLE_Emax), typically resulting from decDoubleLogB. Underflow and overflow might occur. NaNs propagate as usual.

decDoubleSetCoefficient(r, bcd, sign)

Sets the coefficient of r from a BCD integer in the array bcd (uint8_t *) and the signed 32-bit integer (int32_t) sign. bcd must have DECDOUBLE_Pmax elements in the range 0–9, and sign must be DECFLOAT_Sign to set the sign bit of r to 1, or 0 to set it to 0.

If r is a NaN the first byte of bcd will be ignored (the remainder will be the payload), and if it is infinite then all of bcd will be ignored (the coefficient will become zero).

For speed, the arguments are not checked; no status is set by this function. The result is undefined if the arguments are invalid or out of range (that is, could not have been produced by decDoubleGetCoefficient).

(This function is also available in the decSingle module.)

decDoubleSetExponent(r, set, exp)

Sets the exponent of r from the signed 32-bit integer (int32_t) exp. exp is either an in-range exponent or a special code as returned by decDoubleGetExponent. If r becomes infinite then its coefficient is set to zero, if it becomes NaN then the first digit of the coefficient is lost,^[9] otherwise the coefficient is unchanged.

For speed, exp is not checked; however, underflow or overflow can result. The result is undefined if exp is not a value that could have been produced by decDoubleGetExponent.

(This function is also available in the decSingle module.)

decDoubleShift(r, x, y, set)

The result is a copy of x with the digits of the coefficient shifted to the left (if y is positive) or to the right (if y is negative) without adjusting the exponent or the sign of x. Any digits ‘shifted in’ from the left or from the right will be 0.

y is the count of positions to shift and must be a finite integer (with exponent=0) in the range –DECDOUBLE_Pmax through +DECDOUBLE_Pmax. NaNs are propagated as usual. If x is infinite the result is Infinity of the same sign. No status is set unless y is invalid or an operand is an sNaN.

decDoubleShow(x, tag)

This function uses printf to display a readable rendering of x, showing both the encoding (in hexadecimal) and the value, and returns nothing (void). The string tag (const char *) is included in the display and may be used as an identifier for the displayed data.

This function is intended as a debug aid. It is not a programming interface – the format of the displayed data may change from release to release.

(This function is also available in the decSingle module.)

decDoubleSubtract(r, x, y, set)

Subtracts y from x and places the result in r.

decDoubleToBCD(x, exp, bcd)

Converts x into an exponent exp (int32_t *) and a BCD array bcd (uint8_t *). exp is set to the value that would be returned by decDoubleGetExponent(x), and bcd and the returned integer (int32_t) are as from decDoubleGetCoefficient(x, bcd).

(This function is also available in the decSingle module.)

decDoubleToEngString(x, string)

The same as decDoubleToString(x, string) except that if exponential notation is used the exponent will be a multiple of 3 (‘engineering notation’).

(This function is also available in the decSingle module.)

decDoubleToInt32(x, set, round)

Returns a signed 32-bit integer (int32_t) which is the value of x, rounded to an integer if necessary using the explicit rounding mode round (enum rounding) instead of the rounding mode in set.

If x is infinite, is a NaN, or after rounding is outside the range of the result, then DEC_Invalid_operation is set. The DEC_Inexact flag is not set by this function, even if rounding ocurred.

decDoubleToInt32Exact(x, set, round)

The same as decDoubleToInt32 except that if rounding removes non-zero digits then the DEC_Inexact flag is set.

decDoubleToIntegralExact(r, x, set)

Returns the value of x, rounded to an integral value using the rounding mode in set.

If x is infinite, Infinity of the same sign is returned. If x is a NaN, the result is as for other arithmetic operations. If rounding removes non-zero digits then the DEC_Inexact flag is set.

decDoubleToIntegralValue(r, x, set, round)

Returns the value of x, rounded to an integral value using the explicit rounding mode round (enum rounding) instead of the rounding mode in set.

If x is infinite, Infinity of the same sign is returned. If x is a NaN, the result is as for other arithmetic operations. The DEC_Inexact flag is not set by this function, even if rounding ocurred.

decDoubleToNumber(x, dn)

This function is implemented as a macro and sets a decNumber, dn, from x using a decimal64 as a proxy as illustrated in Example 8 in the User’s Guide. The decNumber must have sufficient space for the digits in x.

To use this macro, the decimal64.h header file must be included (see the text following the example for more details). A pointer to dn is returned (decNumber *).

(This function is also available in the decSingle module.)

decDoubleToPacked(x, exp, pack)

Converts x into an exponent exp (int32_t *) and a Packed BCD array pack (uint8_t *). exp is set to the value that would be returned by decDoubleGetExponent(x).

pack receives DECDOUBLE_Pmax packed decimal digits (one digit per four-bit nibble) followed by a sign nibble and prefixed (for decDouble and decQuad only) with an extra pad nibble (which is 0). The sign nibble will be DECPMINUS if x has sign=1 or DECPPLUS otherwise. The digit nibbles will be in the range 0–9.

A signed 32-bit integer (int32_t) is returned; it will be DECFLOAT_Sign if x has sign=1 or otherwise will be 0.

(This function is also available in the decSingle module.)

decDoubleToString(x, string)

Converts x to a zero-terminated string in the character array string (char *) and returns string. string must have at least DECDOUBLE_String elements (this count includes the terminator character).

Finite numbers will be converted to a string with exponential notation if the exponent is positive or if the magnitude of x is less than 1 and would require more than five zeros between the decimal point and the first significant digit.

Note that strings which are not simply numbers (one of Infinity, –Infinity, NaN, or sNaN) are possible. A NaN string may have a leading – sign and/or following payload digits. No digits follow the NaN string if the payload is 0.

(This function is also available in the decSingle module.)

decDoubleToUInt32(x, set, round)

Returns an unsigned 32-bit integer (uint32_t) which is the value of x, rounded to an integer if necessary using the explicit rounding mode round (enum rounding) instead of the rounding mode in set.

If x is infinite, is a NaN, or after rounding is outside the range of the result, then DEC_Invalid_operation is set. The DEC_Inexact flag is not set by this function, even if rounding ocurred.

Note that –0 converts to 0 and is valid, but all negative numbers are not valid.

Footnotes:

[1]	IBM, the IBM logo, System p, System z, and POWER6 are trademarks of International Business Machines Corporation in the United States, other countries, or both. SAP and SAP NetWeaver are trademarks of SAP AG, in Germany, other countries, or both.
[2]	See`http://speleotrove.com/decimal/DPDecimal.html`for a summary of Densely Packed Decimal encoding.
[3]	This requirement is different from the decimal32, decimal64, and decimal128 modules because they can convert to wider or narrower formats using the decNumber format as an intermediate step.
[4]	Except that the widening and narrowing functions are not used by decQuad.
[5]	The `decSingle.h` header file also includes `decDouble.h`, but the `decQuad.h` header file only includes `decContext.h`.
[6]	See`http://speleotrove.com/decimal/#arithmetic`for details.
[7]	The `DEC_Subnormal` flag is particularly expensive to maintain.
[8]	The nextAfter operation was dropped from the proposed standard during the ballot process.
[9]	A NaN payload has one fewer digit than the coefficient of a finite number.

[previous | contents | next]