Operations and Elementary Functions

Bibliography of material on Decimal Arithmetic [Index]

Decimal Arithmetic: Operations and Elementary Functions

agrawal1974
¿Web? Fast B. C. D. Multiplier, Dharma P. Agrawal, Electronics Letters, Vol. 10 #12, pp237–238, IEE, 13 June 1974.
Abstract: A fast b.c.d multiplier is proposed, based on obtaining the product of a 1-digit multiplicand and a 1-digit multiplier in a single row of adders. For high-speed operation, the carry-save technique, universally adopted for binary multipliers, is used.

aharoni2007
URL
¿Web? Solving Constraints on the Intermediate Result of Decimal Floating-Point Operations, Merav Aharoni, Ron Maharik, and Abraham Ziv, Proceedings of the 18th IEEE Symposium on Computer Arithmetic, ISBN 0-7695-2854-6, ISBN 978-0-7695-2854-0, pp38–45, IEEE, June 2007.
Abstract: The draft revision of the IEEE Standard for Floating- Point Arithmetic (IEEE P754) includes a definition for decimal floating-point (FP) in addition to the widely used binary FP specification. The decimal standard raises new concerns with regard to the verification of hardware- and software-based designs. The verification process normally emphasizes intricate corner cases and uncommon events. The decimal format introduces several new classes of such events in addition to those characteristic of binary FP. Our work addresses the following problem: Given a decimal floating-point operation, a constraint on the intermediate result, and a constraint on the representation selected for the result, find random inputs for the operation that yield an intermediate result compatible with these specifications. The paper supplies efficient analytic solutions for addition and for some cases of multiplication and division. We provide probabilistic algorithms for the remaining cases. These algorithms prove to be efficient in the actual implementation.

ahmad1987
¿Web? Implementable Decimal Arithmetic Algorithms for Micro/Minicomputers, M. Ahmad, Microprocessing and Microprogramming, Vol. 19 #2, pp119–128, February 1987.
Abstract: The need for efficient decimal arithmetic and its ever increasing applications in micro/minicomputers and microprocessor based equipment and appliances has been emphasised. Some algorithms suitable for implementation for decimal arithmetic operations of BCD packed decimal numbers have been suggested. These algorithms employ comparatively faster instructions available on most of the microprocessors and provide efficient and faster decimal arithmetic.

bernal2006
¿Web? Integer Representation of Decimal Numbers for Exact Computations, Javier Bernal and Christoph Witzgall, Journal of Research of the National Institute of Standards and Technology, Vol. 111 #2, pp79–88, National Institute of Standards and Technology, March-April 2006.
Abstract: A scheme is presented and software is documented for representing as integers input decimal numbers that have been stored in a computer as double precision floating point numbers and for carrying out multiplications, additions and subtractions based on these numbers in an exact manner. The input decimal numbers must not have more than nine digits to the left of the decimal point. The decimal fractions of their floating point representations are all first rounded off at a prespecified location, a location no more than nine digits away from the decimal point. The number of digits to the left of the decimal point for each input number besides not being allowed to exceed nine must then be such that the total number of digits from the leftmost digit of the number to the location where round-off is to occur does not exceed fourteen.

biswas2008
¿Web? A Novel Approach to Design BCD Adder and Carry Skip BCD Adder, Ashis Kumer Biswas, Md. Mahmudul Hasan, Moshaddek Hasan, Ahsan Raja Chowdhury, and Hafiz Md. Hasan Babu, Proceedings of the 21st International Conference on VLSI Design (VLSID '08), ISBN 0-7695-3083-4, pp566–571, IEEE Computer Society, January 2008.
Abstract: Reversible logic has become one of the most promising research areas in the past few decades and has found its applications in several technologies; such as low power CMOS, nanocomputing and optical computing. This paper presents improved and efficient reversible logic implementations for Binary Coded Decimal (BCD) adder as well as Carry Skip BCD adder. It has been shown that the modified designs outperform the existing ones in terms of number of gates, number of garbage output and delay.

biswas2008b
¿Web? Efficient approaches for designing reversible Binary Coded Decimal adders, Ashis Kumer Biswas, Md. Mahmudul Hasan, Ahsan Raja Chowdhury, and Hafiz Md. Hasan Babu, Microelectronics Journal, Vol. 39 #12, ISSN 0026-2692, pp1693–1703, Elsevier, December 2008.
Abstract: Reversible logic has become one of the most promising research areas in the past few decades and has found its applications in several technologies; such as low-power CMOS, nanocomputing and optical computing. This paper presents improved and efficient reversible logic implementations for Binary Coded Decimal (BCD) adder as well as Carry Skip BCD adder. It has been shown that the modified designs outperform the existing ones in terms of number of gates, number of garbage outputs, delay, and quantum cost. In order to show the efficiency of the proposed designs, lower bounds of the reversible BCD adders in terms of gates and garbage outputs are proposed as well.

brent1976
URL
¿Web? Fast multiple-precision evaluation of elementary functions, Richard P. Brent, Journal of the ACM, Vol. 23 #2, pp242–251, ACM Press, April 1976.
Abstract: Let f(x) be one of the usual elementary functions (exp, log, artan, sin, cosh, etc.), and let M(n) be the number of single-precision operations required to multiply n-bit integers. It is shown that f(x) can be evaluated, with relative error O(2^-n), in O(M(n)log(n)) operations, for any floating-point number x (with an n-bit fraction) in a suitable finite interval. From the Schönhage-Strassen bound on M(n), it follows that an n-bit approximation to f(x) may be evaluated in O(n(log(n))²loglog(n)) operations. Special cases include the evaluation of constants such as pi, e, and e^pi. The algorithms depend on the theory of elliptic integrals, using the arithmetic-geometric

busa2001
¿Web? The IBM z900 Decimal Arithmetic Unit, Fadi Y. Busaba, Christopher A. Krygowski, Wen H. Li, Eric M. Schwarz, and Steven R. Carlough, Conference Record of the 35th Asilomar Conference on Signals, Systems and Computers, Vol. 2, ISBN 0 7803 7147 X, pp1335–1339, IEEE, Nov. 2001.
Abstract: As the cost for adding function to a processor continues to decline, processor designs are including many additional features. An example of this trend is the appearance of graphics engines and compression engines on midrange and even low end microprocessors. One area that has the potential to capture chip real estate is the decimal arithmetic engine because of its importance in financial and business applications. Studies show that 55% of the numeric data stored on commercial databases are in decimal format. Although decimal arithmetic is supported in many software languages it is not yet available on many microprocessors. This paper details the decimal arithmetic engine in the recently announced z900 microprocessor.
Note: IEEE cat #01ch37256.

busa2004
¿Web? The Design of the Fixed Point Unit for the z990 Microprocessor, Fadi Y. Busaba, Timothy Slegel, Steven R. Carlough, Christopher A. Krygowski, and John G Rell, Proceedings of the 14th ACM Great Lakes symposium on VLSI, ISBN 1-58113-853-9, pp364 – 367, ACM Press, 2004.
Abstract: The paper presents the design of the Fixed Point Unit (FXU) for the IBM eServer z990 microprocessor (announced in 2Q ’03) that runs at 1.2 GHz. The FXU is capable of executing two Register-Memory instructions including arithmetic instructions and a branch instruction in a single cycle. The FXU executes a total of 369 instructions that operate on variable size operands (1 to 256 bytes). The instruction set include decimal arithmetic with multiplies and divides, binary arithmetic, shifts and rotates, loads/stores, branches, long moves, logical operations, convert instructions, and other special instructions. The FXU consists of 64-bit dataflow stack that is custom designed and a control stack that is synthesized. The current FXU is the first superscalar design for the CMOS z-series machines, has a new improved decimal unit, and has for the first time a 16x64 bit binary multiplier.

castell2006
¿Web? A 64-bit Decimal Floating-Point Comparator, Ivan D. Castellanos and James E. Stine, IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06), pp138–144, IEEE, 2006.
Abstract: Decimal arithmetic is growing in importance as scientific studies reveal that current financial and commercial applications spend a high percentage overhead in this type of calculations. Typically, software is utilized to emulate decimal floating point arithmetic in these applications. On the other hand, functional units that employ decimal floating point hardware can improve performance by two or three orders of magnitude. This paper presents the design and implementation of a novel decimal floating-point comparator compliant with the current draft revision of the IEEE-754 Standard for floating-point arithmetic. It utilizes a novel BCD magnitude comparator with logarithmic delay and it supports 64-bit decimal floating-point numbers. Area and delay results are examined for an implementation in TSMC SCN6M SCMOS technology.

castell2008
¿Web? Compressor trees for decimal partial product reduction, Ivan D. Castellanos and James E. Stine, Proceedings of the 18th ACM Great Lakes symposium on VLSI, ISBN 978-1-59593-999-9, pp107–110, ACM Press, 2008.
Abstract: Decimal multiplication has grown in interest due to the recent announcement of new IEEE 754R standards and the availability of high-speed decimal computation hardware. Prior research enabled partial products to be coded more efficiently for their use in radix 10 architectures. This paper clarifies previous techniques for partial product reduction using carry-save adders and presents a new 4:2 compressor structure. This new structure improves performance at the expense of more gates, however, regularity is introduced into the circuit to promote implementations in Very Large Scale Integration (VLSI) Designs. Results are presented and compared for several designs using a TSMC SCN6M 0.18 µm feature size.

chroust1981
¿Web? Method of Adding Decimal Numbers by Means of Binary Arithmetic, G. Chroust, IBM Technical Disclosure Bulletin, 03-81, pp4525–4526, IBM, March 1981.
Abstract: The simulation of decimal arithmetic on a machine without packed arithmetic necessitates a method for simulating decimal addition by binary arithmetic.
    Decimal addition simulation is effected by simultaneously applying the following steps to as many digits (d1, d2, .., dn) of the decimal number as fit into the (binary = bin) word length of the object machine. 1. (Binary) addition of the two operands, 2. adding a `6’ in each digit position (this generates the correct carry), and 3. subtracting a `6’ in those places from which no carry resulted.

cody1980
¿Web? Software Manual for the Elementary Functions, W. J. Cody and W. Waite, ISBN 0-13-822064-6, 269pp, Prentice-Hall, 1980.

cowlis2003
URL
¿Web? Decimal Floating-Point: Algorism for Computers, Michael F. Cowlishaw, Proceedings of the 16th IEEE Symposium on Computer Arithmetic, ISBN 0-7695-1894-X, pp104–111, IEEE, June 2003.
Abstract: Decimal arithmetic is the norm in human calculations, and human-centric applications must use a decimal floating-point arithmetic to achieve the same results.
    Initial benchmarks indicate that some applications spend 50% to 90% of their time in decimal processing, because software decimal arithmetic suffers a 100× to 1000× performance penalty over hardware. The need for decimal floating-point in hardware is urgent.
    Existing designs, however, either fail to conform to modern standards or are incompatible with the established rules of decimal arithmetic. This paper introduces a new approach to decimal floating-point which not only provides the strict results which are necessary for commercial applications but also meets the constraints and requirements of the IEEE 854 standard.
    A hardware implementation of this arithmetic is in development, and it is expected that this will significantly accelerate a wide variety of applications.
Note: Softcopy is available in PDF.

crensh1998
URL
¿Web? Integer Square Roots, Jack W. Crenshaw, Embedded Systems Programming, Vol. 11 #2, EDTN, February 1998.

dadda2007
¿Web? Multioperand Parallel Decimal Adder: A Mixed Binary and BCD Approach, Luigi Dadda, IEEE Transactions on Computers, Vol. 56 #10, ISSN 0018-9340, pp1320–1328, IEEE, October 2007.
Abstract: Decimal arithmetic has been in recent years revived due to the large amount of data in commercial applications. We consider the problem of Multi Operand Parallel Decimal Addition with an approach that uses binary arithmetic, suggested by the adoption of BCD numbers. This involves corrections in order to obtain the BCD result, or a binary to decimal conversion. We adopt the latter approach, particularly efficient for a large number of addends. Conversion requires a relatively small area and can afford fast operation. The BD conversion, moreover, allows an easy alignment of the sums of adjacent columns. We treat the design of BCD digit adders using fast carry free adders and the conversion problem through a known parallel scheme using elementary conversion cells. Spreadsheets have been developed for adding several BCD digits and for simulating the binary to decimal conversion as design tool.

dietmeyer1968
¿Web? Generating prime implicants via ternary encoding and decimal arithmetic, D. L. Dietmeyer and J. R. Duley, Communications of the ACM, Vol. 11 #7, ISSN 0001-0782, pp520–523, ACM Press, July 1968.
Abstract: Decimal arithmetic, ternary encoding of cubes, and topological considerations are used in an algorithm to obtain the extremals and prime implicants of Boolean functions. The algorithm, which has been programmed in the FORTRAN language, generally requires less memory than other minimization procedures, and treats DON’T CARE terms in an efficient manner.

doring1997
¿Web? Decimal Adjustment of Long Numbers in Constant Time, Andreas Döring and Wolfgang J. Paul, Information Processing Letters, Vol. 62 #3, pp161–163, Elsevier Science B.V., June 1997.
Abstract: We propose a very simple method for adding and subtracting n-digit binary coded decimal (BCD) numbers with a small constant number of ordinary operations of a 4n-bit binary ALU. With this method addition/subtraction of 8-digit decimal numbers on an intel 486 processor is faster than programs that use the special built-in operations for decimal adjustment.

erle2003
¿Web? Decimal Multiplication Via Carry-Save Addition, Mark A Erle and Michael J Schulte, Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors, the Hague, Netherlands,, pp348–358, IEEE Computer Society Press, June 2003.
Abstract: Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. This paper presents two novel designs for fixed-point decimal multiplication that utilize decimal carry-save addition to reduce the critical path delay. First, a multiplier that stores a reduced number of multiplicand multiples and uses decimal carry-save addition in the iterative portion of the design is presented. Then, a second multiplier design is proposed with several notable improvements including fast generation of multiplicand multiples that do not need to be stored, the use of decimal (4:2) compressors, and a simplified decimal carry-propagate addition to produce the final product. When multiplying two n-digit operands to produce a 2n-digit product, the improved multiplier design has a worst-case latency of n + 4 cycles and an initiation interval of n + 1 cycles. Three data-dependent optimizations, which help reduce the multipliers’ average latency, are also described. The multipliers presented can be extended to support decimal floating-point multiplication.

erle2005
¿Web? Decimal Multiplication With Efficient Partial Product Generation, Mark A Erle, Eric Schwarz, and Michael J Schulte, Proceedings of the 17th IEEE Symposium on Computer Arithmetic, ISBN 0-7695-2366-8, pp21–28, IEEE, June 2005.
Abstract: Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. This paper presents a novel design for fixed-point decimal multiplication that utilizes a simple recoding scheme to produce signed-magnitude representations of the operands thereby greatly simplifying the process of generating partial products for each multiplier digit. The partial products are generated using a digit-by-digit multiplier on a word-by-digit basis, first in a signed-digit form with two digits per position, and then combined via a combinational circuit. As the signed-digit partial products are developed one at a time while traversing the recoded multiplier operand from the least significant digit to the most significant digit, each partial product is added along with the accumulated sum of previous partial products via a signed-digit adder. This work is significantly different from other work employing digit-by-digit multipliers due to the efficiency gained by restricting the range of digits throughout the multiplication process.

erle2007
URL
¿Web? Decimal Floating-Point Multiplication Via Carry-Save Addition, Mark A. Erle, Michael J. Schulte, and Brian J. Hickmann, Proceedings of the 18th IEEE Symposium on Computer Arithmetic, ISBN 0-7695-2854-6, ISBN 978-0-7695-2854-0, pp46–55, IEEE, June 2007.
Abstract: Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. This paper presents the design of a decimal floating-point multiplier that complies with specifications for decimal multiplication given in the draft revision of the IEEE 754 Standard for Floating-point Arithmetic (IEEE 754R). This multiplier extends a previously published decimal fixedpoint multiplier design by adding several features including exponent generation, sticky bit generation, shifting of the intermediate product, rounding, and exception detection and handling. The core of the decimal multiplication algorithm is an iterative scheme of partial product accumulation employing decimal carry-save addition to reduce the critical path delay. Novel features of the proposed multiplier include support for decimal floating-point numbers, on-thefly generation of the sticky bit, early estimation of the shift amount, and efficient decimal rounding. Area and delay estimates are provided for a verified Verilog register transfer level model of the multiplier.

erle2008
URL
¿Web? Algorithms and Hardware Designs for Decimal Multiplication, Mark A. Erle, 217pp, Lehigh University, November 2008.
Abstract: Although a preponderance of business data is in decimal form, virtually all floating-point arithmetic units on today’s general-purpose microprocessors are based on the binary number system. Higher performance, less circuitry, and better overall error characteristics are the main reasons why binary floating-point hardware (BFP) is chosen over decimal floating-point (DFP) hardware. However, the binary number system cannot precisely represent many common decimal values. Further, although BFP arithmetic is well-suited for the scientific community, it is quite different from manual calculation norms and does not meet many legal requirements.
   Due to the shortcomings of BFP arithmetic, many applications involving fractional decimal data are forced to perform their arithmetic either entirely in software or with a combination of software and decimal fixed-point hardware. Providing DFP hardware has the potential to dramatically improve the performance of such applications. Only recently has a large microprocessor manufacturer begun providing systems with DFP hardware. With available die area continually increasing, dedicated DFP hardware implementations are likely to be offered by other microprocessor manufacturers.
   This dissertation discusses the motivation for decimal computer arithmetic, a brief history of this arithmetic, and relevant software and processor support for a variety of decimal arithmetic functions. As the context of the research is the IEEE Standard for Floating-point Arithmetic (IEEE 754-2008) and two-state transistor technology, descriptions of the standard and various decimal digit encodings are described.
   The research presented investigates algorithms and hardware support for decimal multiplication, with particular emphasis on DFP multiplication. Both iterative and parallel implementations are presented and discussed. Novel ideas are advanced such as the use of decimal counters and compressors and the support of IEEE 754-2008 floating-point, including early estimation of the shift amount, in-line exception handling, on-the-fly sticky bit generation, and efficient decimal rounding. The iterative and parallel, decimal multiplier designs are compared and contrasted in terms of their latency, throughput, area, delay, and usage.
   The culmination of this research is the design and comparison of an iterative DFP multiplier with a parallel DFP multiplier. The iterative DFP multiplier is significantly smaller and may achieve a higher practical frequency of operation than the parallel DFP multiplier. Thus, in situations where the area available for DFP is an important design constraint, the iterative DFP multiplier may be an attractive implementation. However, the parallel DFP multiplier has less latency for a single multiply operation and is able to produce a new result every cycle. As for power considerations, the fewer overall devices in the iterative multiplier, and more importantly the fewer storage elements, should result in less leakage. This benefit is mitigated by its higher latency and lower throughput.
   The proposed implementations are suitable for general-purpose, server, and mainframe microprocessor designs. Depending on the demand for DFP in human-centric applications, this research may be employed in the application-specific integrated circuits (ASICs) market.
Note: Available at speleotrove.com.

frankl1972
¿Web? Zoned Decimal Arithmetic, J. W. Franklin, IBM Technical Disclosure Bulletin, 12-72, pp2123–2124, IBM, December 1972.
Abstract: A means is described for performing arithmetic on zoned decimal data that does not require additional storage space for the intermediate result, and which preserves both operands until it is determined that the operation has been performed correctly and successfully.

gord1998
URL
¿Web? A Calculated Look at Fixed-Point Arithmetic, Robert Gordon, Embedded Systems Programming, Vol. 11 #4, pp72–78, Miller Freeman, Inc, April 1998.
Abstract: This article explores the subject of fixed-point numbers and presents techniques you can use to implement efficient, fixed-precision number applications.

hansen1994
¿Web? Multiple-length Division Revisited: a Tour of the Minefield, Per Brinch Hansen, Software -- Practice and Experience Vol. 24 #6, pp579–601, John Wiley & Sons, June 1994.
Abstract: Long division of natural numbers plays a crucial role in Cobol arithmetic, cryptography, and primality testing. Only a handful of textbooks discuss the theory and practice of long division, and none of them do it satisfactorily. This tutorial attempts to fill this surprising gap in the literature on computer algorithms. We illustrate the subtleties of long division by examples, define the problem concisely, summarize the theory, and develop a complete Pascal algorithm using a consistent terminology.

hickmann2007
¿Web? A Parallel IEEE P754 Decimal Floating-Point Multiplier, Brian J. Hickmann, Andrew Krioukov, Michael J. Schulte, and Mark A. Erle, Proceedings of the IEEE International Conference on Computer Design 2007, pp296–303, IEEE, October 2007.
Abstract: Decimal floating-point multiplication is important in many commercial applications including banking, tax calculation, currency conversion, and other financial areas. This paper presents a fully parallel decimal floating-point multiplier compliant with the recent draft of the IEEE P754 Standard for Floating-point Arithmetic (IEEE P754). The novelty of the design is that it is the first parallel decimal floating-point multiplier offering low latency and high throughput. This design is based on a previously published parallel fixed-point decimal multiplier which uses alternate decimal digit encodings to reduce area and delay. The fixed-point design is extended to support floating-point multiplication by adding several components including exponent generation, rounding, shifting, and exception handling. Area and delay estimates are presented that show a significant latency and throughput improvement with a substantial increase in area as compared to the only published IEEE P754 compliant sequential floating-point multiplier. To the best of our knowledge, this is the first publication to present a fully parallel decimal floating-point multiplier that complies with IEEE P754.

hp71ref1987b
¿Web? The IEEE Proposal for Handling Math Exceptions, Hewlett Packard Company, HP-71 Reference Manual, Mfg. # 0071-90110, Reorder # 0071-90010, pp338–345, Hewlett Packard Company, October 1987.
Abstract: The IEEE Radix Independent Floating-Point Proposal divides all of the floating-point “exceptional events” encountered in calculations into five classes of math exceptions: invalid operation, division by zero, overflow, underflow, and inexact result. Associated with each math exception is a flag that is set by the HP-71 whenever an exception is encountered. These flags remain set until you clear them. Each of these flags can be accessed by its number or its name.
Note: First edition October 1983. Manual available from The Museum of HP Calculators (www.hpmuseum.org).

hull1978
¿Web? Desirable Floating-Point Arithmetic and Elementary Functions for Numerical Computation, T. E. Hull, ACM Signum Newsletter, Vol. 14 #1 (Proceedings of the SIGNUM Conference on the Programming Environment for Development of Numerical Software), pp96–99, ACM Press, 1978.
Abstract: The purpose of this talk is to summarize proposed specifications for floating-point arithmetic and elementary functions. The topics considered are: the base of the number system, precision control, number representation, arithmetic operations, other basic operations, elementary functions, and exception handling. The possibility of doing without fixed-point arithmetic is also mentioned. The specifications are intended to be entirely at the level of a programming language such as Fortran. The emphasis is on convenience and simplicity from the user’s point of view. Conforming to such specifications would have obvious beneficial implications for the portability of numerical software, and for proving programs correct, as well as attempting to provide facilities which are most suitable for the user. The specifications are not complete in every detail, but it is intended that they be complete “in spirit” – some further details, especially syntactic details, would have to be provided, but the proposals are otherwise relatively complete.
Note: Also in Proceedings of the IEEE 4th Symposium on Computer Arithmetic pp63-69.

hull1985b
¿Web? Properly Rounded Variable Precision Square Root, T. E. Hull and A. Abrham, ACM Transactions on Mathematical Software, Vol. 11 #3, pp229–237, ACM Press, September 1985.
Abstract: The square root function presented here returns a properly rounded approximation to the square root of its argument, or it raises an error condition if the argument is negative. Properly rounded means rounded to nearest, or to nearest even in case of a tie. It is variable precision in that it is designed to return a p-digit approximation to a p-digit argument, for any p > 0. (Precision p means p decimal digits.) The program and the analysis are valid for all p > 0, but current implementations place some restrictions on p.

hull1986
¿Web? Variable Precision Exponential Function, T. E. Hull and A. Abrham, ACM Transactions on Mathematical Software, Vol. 12 #2, pp79–91, ACM Press, June 1986.
Abstract: The exponential function presented here returns a result which differs from e^x by less than one unit in the last place, for any representable value of x which is not too close to values for which e^x would overflow or underflow. (For values of x which are not within this range, an error condition is raised.) It is a “variable precision” function in that it returns a p-digit approximation for a p-digit argument, for any p > 0 (p-digit means p-decimal-digit). The program and analysis are valid for all p > 0, but current implementations place a restriction on p. The program is presented in a Pascal-like programming language called Numerical Turing which has special facilities for scientific computing, including precision control, directed roundings, and built-in functions for getting and setting exponents.

ibm1998
URL
¿Web? Decimal Arithmetic Instructions, IBM, ESA/390 Principles of Operation, Chapter 8, IBM, 1998.
Abstract: The decimal instructions of this chapter perform arithmetic and editing operations on decimal data. Additional operations on decimal data are provided by several of the instructions in Chapter 7, “General Instructions”. Decimal operands always reside in storage, and all decimal instructions use the SS instruction format. Decimal operands occupy storage fields that can start on any byte boundary.

james2007
¿Web? Quick Addition of Decimals Using Reversible Conservative Logic, Rekha K. James, Shahana T. K., K. Poulose Jacob, and Sreela Sasi, 15th International Conference on Advanced Computing and Communications (ADCOM 2007),, ISBN 0-7695-3059-1, pp191–196, IEEE Computer Society, December 2007.
Abstract: In recent years, reversible logic has emerged as one of the most important approaches for power optimization with its application in low power CMOS, nanotechnology and quantum computing. This research proposes quick addition of decimals (QAD) suitable for multi-digit BCD addition, using reversible conservative logic. The design makes use of reversible fault tolerant Fredkin gates only. The implementation strategy is to reduce the number of levels of delay there by increasing the speed, which is the most important factor for high speed circuits.

jimeno2008
¿Web? A BCD-based architecture for fast coordinate rotation, Antonio Jimeno, Higinio Mora, Jose L. Sanchez, and Francisco Pujol, Journal of Systems Architecture: the EUROMICRO Journal, Vol. 54 #8, ISSN 1383-7621, pp829–840, Elsevier, August 2008.
Abstract: Although radix 10 based arithmetic has been gaining renewed importance over the last few years, decimal systems are not efficient enough and techniques are still under development. In this paper, an improvement of the CORDIC (coordinate rotation digital computer) method for decimal representation is proposed and applied to produce fast rotations. The algorithm uses BCD operands as inputs, combining the advantages of both decimal and binary systems. The result is a reduction of 50% in the number of iterations if compared with the original Decimal CORDIC method. Finally, we present a hardware architecture useful to produce BCD coordinates rotations accurately and fast, and different experiments demonstrating the advantages of the new method are shown. A reduction of 75% in a single stage delay is obtained, whereas the circuit area just increases in about 5%.

jones1962
¿Web? Floating Point Feature On The IBM Type 1620, F. B. Jones and A. W. Wymore, IBM Technical Disclosure Bulletin, 05-62, pp43–46, IBM, May 1962.
Abstract: In the type 1620 automatic floating point operations, a floating point number is a field consisting of a variable length mantissa and a two digit exponent. The exponent is in the two low order positions of the field, and the mantissa is in the remaining high order positions, |M.....M|EE.
    The most significant digit positions are marked by flags and the algebraic signs are marked by flags over the least significant digit positions. The exponent is established on the premise that the mantissa is less than 1.0 and equal to or greater than 0.1, and has a range of -99 to +99. The smallest positive quantity that can be represented is thus 00.... 099. The mantissa may have from two to one hundred digits. ...

kahan1983
URL
¿Web? Mathematics Written in Sand, W. Kahan, Proc. Joint Statistical Mtg. of the American Statistical Association, pp12–26, American Statistical Association, 1983.
Abstract: Simplicity is a Virtue; yet we continue to cram ever more complicated circuits ever more densely into silicon chips, hoping all the while that their internal complexity will promote simplicity of use. This paper exhibits how well that hope has been fulfilled by several inexpensive devices widely used nowadays for numerical computation. One of them is the Hewlett-Packard hp-15C programmable shirtpocket calculator, on which only a few keys need be pressed to perform tasks like these:
    Real and Complex arithmetic, including the elementary transcendental functions and their inverses; Matrix arithmetic including inverse, transpose, determinant, residual, norms, prompted input/output and complex-real conversion; Solve an equation and evaluate an Integral numerically; simple statistics; G and combinatorial functions; ...
    For instance, a stroke of its [1/X] key inverts an 8x8 matrix of 10-sig.-dec. numbers in 90 sec.
    This calculator costs under $100 by mail-order. Mathematically dense circuitry is also found in Intel�s 8087 coprocessor chip, currently priced below $200, which has for two years augmented the instruction repertoire of the 8086 and 8088 microcomputer chips to cope with ...
    Three binary floating-point formats 32, 64 and 80 bits wide; three binary integer formats 16, 32 and 64 bits wide; 18-digit BCDecimal integers; rational arithmetic, square root, format conversion and exception handling all in conformity with p754, the proposed IEEE arithmetic standard (see “Computer” Mar. 1, 1981); the kernels of transcendental functions exp, log, tan and arctan; and an internal stack of eight registers each 80 bits wide.
    For instance, the 8087 has been used to invert a 100x100 matrix of 64-bit floating-point numbers in 90 sec. Among the machines that can use this chip are the widely distributed IBM Personal Computers, each containing a socket already wired for an 8087. Several other manufacturers now produce arithmetic engines that, like the 8087, conform to the proposed IEEE arithmetic standard, so software that exploits its refined arithmetic properties should be widespread soon.
    As sophisticated mathematical operations come into use ever more widely, mathematical proficiency appears to rise; in a sense it actually declines. Computations formerly reserved for experts lie now within reach of whoever might benefit from them regardless of how little mathematics he understands; and that little is more likely to have been gleaned from handbooks for calculators and personal computers than from professors. This trend is pronounced among users of financial calculators like the hp-12C. Such trends ought to affect what and how we teach, as well as how we use mathematics, regardless of whether large fast computers, hitherto dedicated mostly to speed, ever catch up with some smaller machines’ progress towards mathematical robustness and convenience.

karpin1925
¿Web? The History of Arithmetic, Louis Charles Karpinski, 200pp, Rand McNally & Company, 1925.
Abstract: The purpose of this book is to present the development of arithmetic as a vital and integral part of the history of civilization. Particular attention is paid to the material of arithmetic which continues to be taught in our elementary schools and to the historical phases of that work with which the teacher of arithmetic should be familiar...
Note: Reprint: Russell & Russell, New York, 1965.

kautz1958
¿Web? Binary and truth-function operations on a decimal computer with an extract command, William H. Kautz, Communications of the ACM, Vol. 1 #5, pp12–13, ACM Press, May 1958.
Abstract: It occasionally becomes desirable to solve, on automatic digital computing machines which are capable of handling only decimal numbers, problems in logic, class structure, coding, binary relations or binary arithmetic. This note describes how the major logical and binary operations can be carried out on one such machine, the DATATRON 205, without any circuit modifications to the computer. These procedures would be applicable with little modification to any decimal computer with an extract command, however.

keir1975a
¿Web? Programmer-controlled roundoff and the selection of a stable roundoff rule, R. A. Keir, Conf. Rec. 3rd Symp. Comp. Arithmetic CH1017-3C, pp73–76, IEEE Computer Society, 1975.
Abstract: The author suggests that every computer with floating-point addition and subtraction should have PSW controlable roundoff facilities. Yohe’s catalog should be induded. There should also be a stable roundoff mode using the round-to-off [-odd] or round-to-even rule based on whether the radix is divisible by four or only by two.

keir1975b
¿Web? Compatible number representations, R. A. Keir, Conf. Rec. 3rd Symp. Comp. Arithmetic CH1017-3C, pp82–87, IEEE Computer Society, 1975.
Abstract: A compatible number system for mixed fixed-point and floating-point arithmetic is described in termsof number formats and opcode sequences (for hardwired or microcoded control). This inexpensive system can be as fast as fixed-point arithmetic on integers, is faster than normalized arithmetic in floating point, gets answers identical to those of normalized arithmetic, and automatically satisfies the Algol-60 mixed-mode rules. The central concept is the avoidance of meaningless “normalization” following arithmetic operations. Adoption of this system could lead to simpler compilers.

keir1975c
¿Web? Should the stable rounding rule be radix-dependent?, Roy A. Keir, Information Processing Letters, Vol. 3 #6, pp188–189, Elsevier, July 1975.
Abstract: (None.)

kenney2004a
¿Web? Multioperand Decimal Addition (extended version), Robert D Kenney and Michael J Schulte, Proceedings of the IEEE Computer Society Annual Symposium on VLSI, Lafayette, LA, February, 2004., 10pp, IEEE, February 2004.
Abstract: This paper introduces and analyzes four techniques for performing fast decimal addition on multiple binary coded decimal (BCD) operands. Three of the techniques speculate BCD correction values and use chaining to correct intermediate results. The first speculates over one addition. The second speculates over two additions. The third employs multiple instances of the second technique in parallel and then merges the results. The fourth technique uses a binary carry-save adder tree and produces a binary sum. Combinational logic is then used to correct the sum and determine the carry into the next digit. Multioperand adder designs are constructed and synthesized for four to sixteen input operands. Analyses are performed on the synthesis results and the merits of each technique are discussed. Finally, these techniques are compared to previous attempts made at speeding up decimal addition.

kenney2004b
¿Web? High-Frequency Decimal Multiplier, Robert D Kenney, Michael J Schulte, and Mark A. Erle, Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors, ISBN 0 7695 2231 9, pp26–29, IEEE, October 2004.
Abstract: Decimal arithmetic is regaining popularity in the computing community due to the growing importance of commercial, financial, and Internet-based applications, which process decimal data. This paper presents an iterative decimal multiplier, which operates at high clock frequencies and scales well to large operand sizes. The multiplier uses a new decimal representation for intermediate products, which allows for a very fast two- stage iterative multiplier design. Decimal multipliers, which are synthesized using a 0.11 micron CMOS standard cell library, operate at clock frequencies close to 2 GHz. The latency of the proposed design to multiply two n-digit BCD operands is (n + 8) cycles with a new multiplication able to begin every (n + 1) cycles.

kenney2005
¿Web? High-speed multioperand decimal adders, R.D. Kenney and M. J. Schulte, IEEE Transactions on Computers, Vol. 54 #8, ISSN 0018-9340, pp953–963, IEEE, August 2005.
Abstract: There is increasing interest in hardware support for decimal arithmetic as a result of recent growth in commercial, financial, and Internet-based applications. Consequently, new specifications for decimal floating-point arithmetic have been added to the draft revision of the IEEE-754 Standard for Floating-Point Arithmetic. This paper introduces and analyzes three techniques for performing fast decimal addition on multiple binary coded decimal (BCD) operands. Two of the techniques speculate BCD correction values and correct intermediate results while adding the input operands. The first speculates over one addition. The second speculates over two additions. The third technique uses a binary carry-save adder tree and produces a binary sum. Combinational logic is then used to correct the sum and determine the carry into the next more significant digit. Multioperand adder designs are constructed and synthesized for four to 16 input operands. Analyses are performed on the synthesis results and the merits of each technique are discussed. Finally, these techniques are compared to several previous techniques for high-speed decimal addition.

kim2006
¿Web? A Hybrid Decimal Division Algorithm Reducing Computational Iterations, Yong-Dae Kim, Soon-Youl Kwon, Seon-Kyoung Han, Kyoung-Rok Cho, and Younggap You, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences Vol. E89-A #6, pp1807–1812, The Institute of Electronics, Information and Communication Engineers, 2006.
Abstract: This paper presents a hybrid decimal division algorithm to improve division speed. The proposed hybrid algorithm employs either non-restoring or restoring algorithm on each digit to reduce iterative computations. The selection of the algorithm is based on the relative remainder values with respect to the half of its divisor. The proposed algorithm requires maximum 7n+4 add/subtract operations for an n-digit quotient, whereas other restoring or non-restoring schemes comprise more than 10n+1 operations.

kleinsteiber1980
¿Web? IBM 4341 hardware/microcode trade-off decisions, James R. Kleinsteiber, MICRO 13: Proceedings of the 13th annual workshop on Microprogramming, pp190–192, ACM Press, December 1980.
Abstract: The design of IBM’s 4341 Processor, as with other processors, involved many cost/performance tradeoffs. The designer is continually under pressure to increase processor speed without increasing cost or to decrease processor cost without decreasing performance. This paper will examine some of the engineering decisions that were made in the attempt to make the 4341 a high-performing yet low cost processor. These decisions include searching for, or developing, algorithms that make the best use of hardware properties, such as data path width, arithmetic/logical operations and special functions. Functions were sought such that a small amount of added hardware would go a long way towards improving system performance. Hardware designers, microcoders and performance analysis people worked together to implement instructions, functions and algorithms with the proper mixture of hardware functions and microcode in order to build a viable processor. Some specific functions will be covered to examine a few of the decisions. The TEST UNDER MASK performance problem will be discussed with its resulting implementation decision. The method of using EXCLUSIVE OR to clear storage and the resulting algorithm design will be shown. Other topics to be discussed include multiple hardware functions and the resulting effect on floating point, fixed point and decimal multiply; the divide function and its effect on floating point and fixed point divide; and the effect of an 8-byte data path for decimal arithmetic.
Note: Also published in December 1980 SIGMICRO Newsletter Volume 11 Issue 3-4

knuth1998
URL
¿Web? The Art of Computer Programming, Vol 2, Donald E. Knuth, ISBN 0-201-89684-2, 762pp, Addison Wesley Longman, 1998.
Abstract: The chief purpose of this chapter [4] is to make a careful study of the four basic processes of arithmetic: addition, subtraction, multiplication, and division. Many people see arithmetic as a trivial thing that children learn and computers do, but we will see that arithmetic is a fascinating topic with many interesting facets. ...
Note: Third edition. See especially sections 4.1 through 4.4.

lang2006
¿Web? A Radix-10 Combinational Multiplier, Tomás Lang and Alberto Nannarelli, Proceedings of 40th Asilomar Conference on Signals, Systems, and Computers, pp313–317, IEEE, October 2006.
Abstract: In this work, we present a combinational decimal multiply unit which can be pipelined to reach the desired throughput. With respect to previous implementations of decimal multiplication, the proposed unit is combinational (parallel) and not sequential, has a simpler recoding of the operands which reduces the number of partial product precomputations and uses counters to eliminate the need of the decimal equivalent of a 4:2 adder. The results of the implementation show that the combinational decimal multiplier offers a good compromise between latency and area when compared to other decimal multiply units and to binary double-precision multipliers.

lang2007
URL
¿Web? A Radix-10 Digit-Recurrence Division Unit: Algorithm and Architecture, Tomás Lang and Alberto Nannarelli, IEEE Transactions on Computers, Vol. 56 #6, pp727–739, IEEE, June 2007.
Abstract: In this work, we present a radix-10 division unit that is based on the digit-recurrence algorithm. The previous decimal division designs do not include recent developments in the theory and practice of this type of algorithm, which were developed for radix-2k dividers. In addition to the adaptation of these features, the radix-10 quotient digit is decomposed into a radix-2 digit and a radix-5 digit in such a way that only five and two times the divisor are required in the recurrence. Moreover, the most significant slice of the recurrence, which includes the selection function, is implemented in radix-2, avoiding the additional delay introduced by the radix-10 carry-save additions and allowing the balancing of the paths to reduce the cycle delay. The results of the implementation of the proposed radix-10 division unit show that its latency is close to that of radix-16 division units (comparable dynamic range of significands) and it has a shorter latency than a radix-10 unit based on the Newton-Raphson approximation.

lee1989
URL
¿Web? Multistep Gradual Rounding, Corinna Lee, IEEE Transactions on Computers, Vol. 28 #4, pp595–600, IEEE, April 1989.
Abstract: A value V is to be rounded to an arbitrary precision resulting in the value V“. Conventional rounding technique uses one step to accomplish this. Alternatively, multistep rounding uses several steps to round the value V to successively shorter precisions with the final rounding step producing the desired value V”. This alternate rounding method is one way to implement, with the minimum of hardware, the denormalization process that the IEEE Floating-Point Standard 754 requires when underflow occurs. There are certain cases for which multistep rounding produces a different result than single-step rounding. To prevent such a step error, the author introduces a rounding procedure called gradual rounding that is very similar to conventional rounding with the addition of two tag bits associated with each floating-point register.

mano1965
¿Web? Pracniques: simulation of Boolean functions in a decimal computer, M. Morris Mano, Communications of the ACM, Vol. 8 #1, ISSN 0001-0782, pp39–40, ACM Press, January 1965.
Abstract: A method is presented here for simulating logical functions in a digital computer by means of simple arithmetic and control instructions. This method is of practical value when the computer used does not have built-in logical instructions.

mosh1954
¿Web? The Generation of Pseudo-Random Numbers on a Decimal Calculator, Jack Moshman, Journal of the ACM Vol. 1 #2, pp88–91, ACM Press, April 1954.
Abstract: (None.) Describes the generation of pseudo-random numbers on the decimal UNIVAC machine.

moshier1989
¿Web? Methods and Programs for Mathematical Functions, Stephen L. Moshier, 415pp, Prentice-Hall, Inc., Englewood Cliffs, New Jersey 07632, USA, 1989.
Abstract: This book provides a working collection of mathematical software for computing various elementary and higher functions. It also supplies tutorial information of a practical nature; the purpose of this is to assist in constructing numerical programs for the reader’s special applications.
    Though some of the main analytical techniques for deriving functional expansions are described, the emphasis is on computing; so there has been no attempt to incorporate or supplant the many books on functional and numerical analysis that are available. ...
Note: Program source codes are available at http://www.netlib.org/cephes.

moskal2007
¿Web? Design and Synthesis of a Carry-Free Signed-Digit Decimal Adder, John Moskal, Erdal Oruklu, and Jafar Saniie, IEEE International Symposium on Circuits and Systems (ISCAS 2007), pp1089–1092, IEEE, May 2007.
Abstract: The decimal arithmetic has been receiving an increased attention because of the growth of financial and scientific applications requiring high precision and increased computing power. This paper presents an efficient architecture for multi-digit decimal addition based on carry-free signed-digit numbers. In this study, the decimal adder architecture has been designed and synthesized using the TSMC 0.18mu technology. The synthesis results were compared to the existing decimal adders with respect to design area, delay and power consumption. These results show that proposed adder architecture improves the area-delay factor by 3 for a 32 digit adder.

nikmehr2004
¿Web? A decimal carry-free adder, Hooman Nikmehr, Braden Phillips, and Cheng-Chew Lim, SPIE Symposium Smart Materials, Nano-, and Micro-Smart Systems, Proceedings of SPIE Vol. 5649, 12pp, SPIE International Society for Optical Engineering, December 2004.
Abstract: Recently, decimal arithmetic has become attractive in the financial and commercial world including banking, tax calculation, currency conversion, insurance and accounting. Although computers are still carrying out decimal calculation using software libraries and binary floating-point numbers, it is likely that in the near future, all processors will be equipped with units performing decimal operations directly on decimal operands. One critical building block for some complex decimal operations is the decimal carry-free adder. This paper discusses the mathematical framework of the addition, introduces a new signed-digit format for representing decimal numbers and presents an efficient architectural implementation. Delay estimation analysis shows that the adder offers improved performance over earlier designs.

nikmehr2006
¿Web? Fast Decimal Floating-Point Division, Hooman Nikmehr, Braden Phillips, and Cheng-Chew Lim, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 14 #9, ISSN 1063-8210, pp951–961, IEEE, September 2006.
Abstract: A new implementation for decimal floating-point (DFP) division is introduced. The algorithm is based on high-radix SRT division. The SRT division algorithm is named after D. Sweeney, J. E. Robertson, and T. D. Tocher, with the recurrence in a new decimal signed-digit format. Quotient digits are selected using comparison multiples, where the magnitude of the quotient digit is calculated by comparing the truncated partial remainder with limited precision multiples of the divisor. The sign is determined concurrently by investigating the polarity of the truncated partial remainder. A timing evaluation using a logic synthesis shows a significant decrease in the division execution time in contrast with one of the fastest DFP dividers reported in the open literature.

obai1992
¿Web? A Decimal Multiplication Algorithm for Microcomputers, Mohammad S. Obaidat and Saleh A. Bleha, Computers and Electrical Engineering, Vol. 18 #5, pp357–363, Elsevier, September 1992.
Abstract: A decimal multiplication algorithm is developed and its implementation for microcomputers is illustrated. The algorithm can provide an average multiplication speedup equal to 1.34 compared to the traditional algorithm that is based on repeated additions if both are implemented in pure hardware. The average speedup of the developed algorithm is 1.20 if implemented on an 8-bit microcomputer system. The algorithm is significant especially for simple real-time applications that require cost-effective designs.

rich1955
¿Web? Arithmetic Operations in Digital Computers, R. K. Richards, ISBN (none), 397pp, D. Van Nostrand Co., NY, 1955.
Abstract: Among the first things that are learned in a study of mathematics are rules and procedures for performing basic arithmetic operations, notably addition, subtraction, multiplication, and division. The rules and procedures taught in school are, for the most part, aimed at making the operations as simple and speedy as possible when a pencil and a piece of paper are the only tools. In the design of more elaborate arithmetical tools, it is usually found necessary or at least highly desirable to devise new methods for executing the various arithmetic operations. ...
Note: Library of Congress No. 55-6234. Bibliography 9pp.

rich1973
¿Web? Variable-Precision Exponentiation, P. L. Richman, Communications of the ACM, Vol. 16 #1, pp38–40, ACM Press, January 1973.
Abstract: A previous paper presented an efficient algorithm, called the Recomputation Algorithm, for evaluating a rational expression to within any desired tolerance on a computer which performs variable-precision aritbmetic operations. The Recomputation Algorithm can be applied to expressions involving any variable-precision operations having O(10^p + S | e_ii |) error bounds, where p denotes the operation’s precision and e_i denotes the error in the operation’s ith argument. This paper presents an efficient variable-precision exponential operation with an error bound of the above order. Other operations, such as log, sin, and cos, which have simple series expansions, can be handled similarly.

ris1976
¿Web? A Unified Decimal Floating-Point Architecture for the Support of High-Level Languages, Frederic N. Ris, ACM SIGNUM Newsletter, Vol. 11 #3, pp18–23, ACM Press, October 1976.
Abstract: This paper summarizes a proposal for a decimal floating-point arithmetic interface for the support of high-level languages, consisting both of the arithmetic operations observed by application programs and facilities to produce subroutine libraries accessible from these programs. What is not included here are the detailed motivations, examinations of alternatives, and implementation considerations which will appear in the full work.
Note: Also in ACM SIGARCH Computer Architecture News, Vol 5 #4, pp21-31, October 1976. Also in ACM SIGPLAN Notices, Vol 12 #9, pp60-70, September 1977. Also in IBM RC 6203 (#26651) 11pp, September 1976.

sacks1982
¿Web? Applications of Redundant Number Representations to Decimal Arithmetic, R. Sacks-Davis, The Computer Journal, Vol. 25 #4, pp471–477, November 1982.
Abstract: A decimal arithmetic unit is proposed for both integer and floating-point computations. To achieve comparable speed to a binary arithmetic unit, the decimal unit is based on a redundant number representation. With this representation no loss of compactness is made relative to binary coded decimal (BCD) form. In this paper the hardware required for the implementation of the basic operations of addition, subtraction, multiplication and division are described and the properties of floating-point arithmetic based on a redundant number representation are investigated.

schmid1974
¿Web? Decimal Computation, Hermann Schmid, ISBN 047176180X, 266pp, Wiley, 1974.
Abstract: This book is thus a collection, a catalog, and a review of BCD computation techniques. The book describes how each of the most common arithmetic and transcendental operations can be implemented in a variety of ways. ... covers ... A review of number systems, BCD codes, of early calculating instruments and electronic calculating machines ... An outline of BCD computing circuit applications in the automotive, consumer, education, and entertainment fields, illustrated with some specific examples ... Mathematical developments of the algorithms ... Discussions and comparisons of circuit complexity and performance (accuracy, resolution, and speed of operation) for the different algorithms ...
Note: Reprinted 1983, ISBN 0-89874-318-4, Robert E. Krieger Publishing Co.

senzig1975
¿Web? Calculator Algorithms, Don Senzig, IEEE Compcon Reader Digest, IEEE Catalog No. 75 CH 0920-9C, pp139–141, IEEE, Spring 1975.
Abstract: This paper discusses algorithms for generating the trigonometric, exponential, and hyperbolic functions and their inverses. No invention is claimed here. The algorithm for logarithm was used by Briggs in compiling his table of logarithms in the 1600’s. Other earlier references are (cited). The development presented here is, perhaps, more direct than those given in the above references but leads to the same result.

shirazi1988
¿Web? VLSI designs for redundant binary-coded decimal addition, Behrooz Shirazi, David Y. Y. Yun, and Chang N. Zhang, IEEE Seventh Annual International Phoenix Conference on Computers and Communications, 1988, pp52–56, IEEE, March 1988.
Abstract: Binary-coded decimal (BCD) system provides rapid binary-decimal conversion. However, BCD arithmetic operations are often slow and require complex hardware. One can eliminate the need for carry propagation and thus improve performance of BCD operations by using a redundant binary-coded decimal (RBCD) system. This paper introduces the VLSI design of an RBCD adder. The design consists of two small PLA’s and two four-bit binary adders for one digit of the RBCD adder. The addition delay is constant for n-digit RBCD addition (no carry propagation delay). The VLSI time and space complexities of the design as well as its layout are presented, showing the regularity of the structures. In addition, two simple algorithms and the corresponding hardware designs for conversion between RBCD and BCD are presented.

smith2003
URL
¿Web? Using multiple-precision arithmetic, David M Smith, Computing in Science and Engineering, Vol. 5 #4, pp88–93, IEEE Computer Society, July 2003.
Abstract: High-precision arithmetic is useful in many different computational problems. The most common is a numerically unstable algorithm, for which, say, 53-bit (ANSI/IEEE 754-1985 Standard) double precision would not yield a sufficiently accurate result.
Note: Related papers by same author at: http://myweb.lmu.edu/dmsmith/FMLIB.html

soule1975
¿Web? Addition in an Arbitrary Base Without Radix Conversion, Stephen Soule, Communications of the ACM Vol. 18 #6, pp344–346, ACM Press, June 1975.
Abstract: This paper presents a generalization of an old programming technique; using it, one may add and subtract numbers represented in any radix, including a mixed radix, and stored one digit per byte in bytes of sufficient size. Radix conversion is unnecessary, no looping is required, and numbers may even be stored in a display (I/O) format. Applications to Cobol, MIX, and hexadecimal sums are discussed.

svoboda1969
¿Web? Decimal Adder with Signed Digit Arithmetic, Antonin Svoboda, IEEE Transactions on Computers, Vol. 18 #3, pp212–215, IEEE, March 1969.
Abstract: The decimal adder with signed digit arithmetic presented here was designed to establish the following facts: the redundant representation of a decimal digit x_i by a 5-bit binary number X_i=3x_i leads to a logical design of extreme simplicity; it is possible to form an additional algorithm for the adder so that it can be used to transform numbers written in a conventional decinal form into a signed digit form, and vice versa.

thapliyal2006
URL
¿Web? Novel BCD Adders and Their Reversible Logic Implementation for IEEE 754r Format, Himanshu Thapliyal, Saurabh Kotiyal, and M. B. Srinivas, Proceeding of the 19th International Conference on VLSI Design (VLSID’06), pp387–392, IEEE, 2006.
Abstract: IEEE 754r is the ongoing revision to the IEEE 754 floating point standard and a major enhancement to the standard is the addition of decimal format. This paper proposes two novel BCD adders called carry skip and carry look-ahead BCD adders respectively. Furthermore, in the recent years, reversible logic has emerged as a promising technology having its applications in low power CMOS, quantum computing, nanotechnology, and optical computing. It is not possible to realize quantum computing without reversible logic. Thus, this paper also provides the reversible logic implementation of the conventional BCD adder as the well as the proposed Carry Skip BCD adder using a recently proposed TSG gate. Furthermore, a new reversible gate called TS-3 is also being proposed and it has been shown that the proposed reversible logic implementation of the BCD Adders is much better compared to recently proposed one, in terms of number of reversible gates used and garbage outputs produced. The reversible BCD circuits designed and proposed here form the basis of the decimal ALU of a primitive quantum CPU.

thapliyal2006b
¿Web? Modified Carry Look Ahead BCD Adder With CMOS and Reversible Logic Implementation, Himanshu Thapliyal and Hamid R. Arabnia, Proceedings of the 2006 International Conference on Computer Design (CDES'06), ISBN 1-60132-009-4, pp64–69, CSREA Press, November 2006.
Abstract: IEEE 754r is the ongoing revision to the IEEE 754 floating point standard and a major enhancement to the standard is the addition of decimal format. Firstly, this paper proposes novel two transistor AND & OR gates. The proposed AND gate has no power supply, thus it can be referred as the Powerless AND gate. Similarly, the proposed two transistor OR gate has no ground and can be referred as Groundless OR. Two designs of AND & OR gate without VDD or GND are also shown. Secondly for IEEE 754r format, one novel BCD adder called carry look-ahead BCD adder is also proposed. In order to design the carry look-ahead BCD adder, a novel 4 bit carry look-ahead adder called NCLA is proposed which forms the basic building block of the proposed carry look-ahead BCD adder. The proposed two transistors AND & OR gates are used to provide the optimized small area, low power, high throughput circuitries of the proposed BCD adder. Nowadays, reversible logic is also emerging as a promising computing paradigm having its applications in quantum computing, optical computing and nanotechnology. Thus, reversible logic implementation of the proposed BCD Adder is also shown in this paper.

thapliyal2006c
¿Web? Design of Novel Reversible Carry Look-Ahead BCD Subtractor, Himanshu Thapliyal and Sumedha K. Gupta, Proceedings of the 9th International Conference on Information Technology (ICIT'06), ISBN 0-7695-2635-7, pp253–258, IEEE, December 2006.
Abstract: IEEE 754r is the ongoing revision to the IEEE 754 floating point standard. A major enhancement to the standard is the addition of decimal format, thus the design of BCD arithmetic units is likely to get significant attention. Firstly, this paper introduces a novel carry look-ahead BCD adder and then builds a novel carry look-ahead BCD subtractor based on it. Secondly, it introduces the reversible logic implementation of the proposed carry look-ahead BCD subtractor. We have tried to design the reversible logic implementation of the BCD Subtractor optimal in terms of number of reversible gates used and garbage outputs produced. Thus, the proposed work will be of significant value as the technologies mature.

thomp2004
¿Web? A 64-bit Decimal Floating-Point Adder (extended version), John Thompson, Nandini Karra, and Michael J Schulte, Proceedings of the IEEE Computer Society Annual Symposium on VLSI, Lafayette, LA, February, 2004., pp297–298, IEEE, February 2004.
Abstract: Due to the rapid growth in financial, commercial, and Internet-based applications, there is an increasing desire to allow computers to operate on both binary and decimal floating-point numbers. Consequently, specifications for decimal floating-point arithmetic are being added to the IEEE-754 Standard for Floating-Point Arithmetic. In this paper, we present the design and implementation of a decimal floating-point adder that is compliant with the current draft revision of the IEEE-754 Standard. The adder supports operations on 64-bit (16-digit) decimal floating-point operands. We provide synthesis results indicating the estimated area and delay for our design when it is pipelined to various depths.

thomsen2008
¿Web? Optimized reversible binary-coded decimal adders, Michael Kirkedal Thomsen and Robert Gl�ck, Journal of Systems Architecture: the EUROMICRO Journal, Vol. 54 #7, ISSN 1383-7621, pp697–706, Elsevier, July 2008.
Abstract: Babu and Chowdhury recently proposed, in this journal, a reversible adder for binary-coded decimals. This paper corrects and optimizes their design. The optimized 1-decimal BCD full-adder, a 13x13 reversible logic circuit, is faster, and has lower circuit cost and less garbage bits. It can be used to build a fast reversible m-decimal BCD full-adder that has a delay of only m+17 low-power reversible CMOS gates. For a 32-decimal (128-bit) BCD addition, the circuit delay of 49 gates is significantly lower than is the number of bits used for the BCD representation. A complete set of reversible half- and full-adders for n-bit binary numbers and m-decimal BCD numbers is presented. The results show that special-purpose design pays off in reversible logic design by drastically reducing the number of garbage bits. Specialized designs benefit from support by reversible logic synthesis. All circuit components required for optimizing the original design could also be synthesized successfully by an implementation of an existing synthesis algorithm.

tsen2007b
¿Web? Hardware Design of a Binary Integer Decimal-based Floating-point Adder, Charles Tsen, Sonia Gonzalez-Navarro, and Michael J. Schulte, Proceedings of the IEEE 25th International Conference on Computer Design, 9pp, IEEE, October 2007.
Abstract: Because of the growing importance of decimal floating-point (DFP) arithmetic, specifications for it are included in the IEEE Draft Standard for Floating-point Arithmetic (IEEE P754). In this paper, we present a novel algorithm and hardware design for a DFP adder. The adder performs addition and subtraction on 64-bit operands that use the IEEE P754 binary encoding of DFP numbers, widely known as the Binary Integer Decimal (BID) encoding. The BID adder uses a novel hardware component for decimal digit counting and an enhanced version of a previously published BID rounding unit. By adding more sophisticated control, operations are performed with variable latency to optimize for common cases. We show that a BID-based DFP adder design can be achieved with a modest area increase compared to a single 2-stage pipelined 64-bit fixed-point multiplier. Over 70% of the BID adder�s area is due the 64-bit fixed-point multiplier, which can be shared with a binary floating-point multiplier and hardware for other DFP operations. To our knowledge, this is the first hardware design for adding and subtracting IEEE P754 BID-encoded DFP numbers.

vazquez2007
URL
¿Web? A New Family of High�Performance Parallel Decimal Multipliers, Alvaro V�zquez, Elisardo Antelo, and Paolo Montuschi, Proceedings of the 18th IEEE Symposium on Computer Arithmetic, ISBN 0-7695-2854-6, ISBN 978-0-7695-2854-0, pp195–204, IEEE, June 2007.
Abstract: This paper introduces two novel architectures for parallel decimal multipliers. Our multipliers are based on a new algorithm for decimal carry�save multioperand addition that uses a novel BCD�4221 recoding for decimal digits. It significantly improves the area and latency of the partial product reduction tree with respect to previous proposals. We also present three schemes for fast and efficient generation of partial products in parallel. The recoding of the BCD�8421 multiplier operand into minimally redundant signed�digit radix�10, radix�4 and radix�5 representations using new recoders reduces the complexity of partial product generation. In addition, SD radix�4 and radix�5 recodings allow the reuse of a conventional parallel binary radix�4 multiplier to perform combined binary/ decimal multiplications. Evaluation results show that the proposed architectures have interesting area�delay figures compared to conventional Booth radix�4 and radix�8 parallel binary multipliers and other representative alternatives for decimal multiplication.

veerama2007
¿Web? Novel, High-Speed 16-Digit BCD Adders Conforming to IEEE 754r Format, Sreehari Veeramachaneni, M.Kirthi Krishna, Lingamneni Avinash, Sreekanth Reddy P, and M.B. Srinivas, IEEE Computer Society Annual Symposium on VLSI (ISVLSI '07), pp343–350, IEEE, May 2007.
Abstract: In view of increasing prominence of commercial, financial and internet-based applications that process data in decimal format, there is a renewed interest in providing hardware support to handle decimal data. In this paper, a new architecture for efficient 1-digit decimal addition of binary coded decimal (BCD) operands, which is the core of high speed multi-operand adders and floating decimal-point arithmetic, is proposed. Based on this 1-digit BCD adder, novel architectures for higher order (n-digit) BCD adders such as ripple carry adder and carry look-ahead adder are derived. The proposed circuits are compared (both qualitatively as well as quantitatively) with the existing circuits in literature and are shown to perform better. Simulation results show that the proposed 1-digit BCD adder achieves an improvement of 40% in delay. The 16-digit BCD lookahead adder using prefix logic is shown to perform at least 80% faster than the existing ripple carry one.

veerama2008
¿Web? A Novel Carry-Look Ahead Approach to a Unified BCD and Binary Adder/Subtractor, Sreehari Veeramachaneni, M. Kirthi Krishna, G. V. Prateek, S. Subroto, S. Bharat, and M. B. Srinivas, Proceedings of the 21st International Conference on VLSI Design (VLSID '08), ISBN 0-7695-3083-4, pp547–552, IEEE Computer Society, January 2008.
Abstract: Increasing prominence of commercial, financial and internet-based applications, which process decimal data, there is an increasing interest in providing hardware support for such data. In this paper, new architecture for efficient binary and Binary Coded Decimal (BCD) adder/subtractor is presented. This employs a new method of subtraction unlike the existing designs which mostly use 10’s complements, to obtain a much lower latency. Though there is a necessity of correction in some cases, the delay overhead is minimal. A complete discussion about such cases and the required logic to process is presented. The architecture is run-time reconfigurable to facilitate both BCD and binary operations, including signed and unsigned numbers. The proposed circuits are compared (both qualitatively as well as quantitatively) with the existing circuits in literature and are shown to perform better. Simulation results show that the proposed architecture is at least 11% faster than the existing designs.

vowels1992
¿Web? Division by 10, R. A. Vowels, Australian Computer Journal, Vol. 24 #3, pp81–85, ACS, August 1992.
Abstract: Division of a binary integer and a binary floating-point mantissa by 10 can be performed with shifts and adds, yielding a significant improvement in hardware execution time, and in software execution time if no hardware divide instruction is available. Several algorithms are given, appropriate to specific machine word sizes, hardware and hardware instructions available, and depending on whether a remainder is required.
    The integer division algorithms presented here contain a new strategy that produces the correct quotient directly, without the need for the supplementary correction required of previously published algorithms. The algorithms are competitive in time with binary coded decimal (BCD) divide by 10.
    Both the integer and floating-point algorithms are an order of magnitude faster than conventional division.

wang2004
¿Web? Decimal Floating-Point Division Using Newton-Raphson Iteration, Liang-Kai Wang and Michael J Schulte, Proceedings of the 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP’04), pp84–95, IEEE Computer Society Press, September 2004.
Abstract: Decreasing feature sizes allow additional functionality to be added to future microprocessors to improve the performance of important application domains. As a result of rapid growth in financial, commercial, and Internet-based applications, hardware support for decimal floating-point arithmetic is now being considered by various computer manufacturers and specifications for decimal floating-point arithmetic have been added to the draft revision of the IEEE-754 Standard for Floating-Point Arithmetic (IEEE-754R). This paper presents an efficient arithmetic algorithm and hardware design for decimal floating-point division. The design uses an optimized piecewise linear approximation, a modified Newton- Raphson iteration, a specialized rounding technique, and a simplified combined decimal incrementer/decrementer. Synthesis results show that a 64-bit (16-digit) implementation of the decimal divider, which is compliant with IEEE-754R, has an estimated critical path delay of 0.69 ns when implemented using LSI Logic’s 0.11 micron gflx-p standard cell library.

wang2005
¿Web? Decimal Floating-Point Square Root Using Newton-Raphson Iteration, Liang-Kai Wang and Michael J Schulte, Proceedings of the 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP’05), pp309–315, IEEE Computer Society Press, July 2005.
Abstract: With continued reductions in feature size, additional functionality may be added to future microprocessors to boost the performance of important application domains. Due to growth in commercial, financial, and Internet-based applications, decimal floating point arithmetic is now attracting more attention, and hardware support for decimal operations is being considered by various computer manufacturers. In order to standardize decimal number formats and operations, specifications for decimal floating-point arithmetic have been added to the draft revision of the IEEE-754 Standard for Floating-Point Arithmetic (IEEE-754R). This paper presents an efficient arithmetic algorithm and hardware design for decimal floating-point square root. This design uses an optimized piecewise linear approximation, a modified Newton-Raphson iteration, a specialized rounding technique, and a modified decimal multiplier. Synthesis results show that a 64-bit (16-digit) implementation of the decimal square root, which is compliant with the IEEE-754R, has an estimated critical path delay of 0.95 ns and maximum latency of 210 clock cycles when implemented using LSI Logic’s 0.11 micron Gflx-P Standard Cell library.

wang2007
URL
¿Web? Decimal Floating-Point Adder and Multifunction Unit with Injection-Based Rounding, Liang-Kai Wang and Michael J. Schulte, Proceedings of the 18th IEEE Symposium on Computer Arithmetic, ISBN 0-7695-2854-6, ISBN 978-0-7695-2854-0, pp56–65, IEEE, June 2007.
Abstract: Shrinking feature sizes gives more headroom for designers to extend the functionality of microprocessors. The IEEE 754R working group has revised the IEEE 754-1985 Standard for Binary Floating-Point Arithmetic to include specifications for decimal floating-point arithmetic and IBM recently announced incorporating a decimal floatingpoint unit into their POWER6 processor. As processor support for decimal floating-point arithmetic emerges, it is important to investigate efficient algorithms and hardware designs for common decimal floating-point arithmetic algorithms. This paper presents novel designs for a decimal floating-point adder and a decimal floating-point multifunction unit. To reduce their delay, both the adder and the multifunction unit use decimal injection-based rounding, a new form of decimal operand alignment, and a fast flag-based method for rounding and overflow detection. Synthesis results indicate that the proposed adder is roughly 21% faster and 1.6% smaller than a previous decimal floating-point adder design, when implemented in the same technology. Compared to the decimal floating-point adder, the decimal floating-point multifunction unit provides six additional operations, yet only has 2.8%more delay and 9.7% more area.

wang2007c
¿Web? A Decimal Floating-Point Divider using Newton-Raphson Iteration, Liang-Kai Wang and Michael J. Schulte, Journal of VLSI Signal Processing Systems, Vol. 49 #1, ISSN 0922-5773, pp3–18, Kluwer Academic Publishers, October 2007.
Abstract: Increasing chip densities and transistor counts provide more room for designers to add functionality for important application domains into future microprocessors. As a result of rapid growth in financial, commercial, and Internet-based applications, hardware support for decimal floating-point arithmetic is now being considered by various computer manufacturers and specifications for decimal floating-point arithmetic have been added to the draft revision of the IEEE-754 Standard for Floating-Point Arithmetic (IEEE P754). In this paper, we present an efficient arithmetic algorithm and hardware design for decimal floating-point division. The design uses an efficient piecewise linear approximation, a modified Newton-Raphson iteration, a specialized rounding technique, and a simplified decimal incrementer and decrementer. Synthesis results show that a 64-bit (16-digit) implementation of the decimal divider, which is compliant with the current version of IEEE P754, has an estimated critical path delay of 0.69 ns (around 13 FO4 inverter delays) when implemented using LSI Logic’s 0.11 micron Gflx-P standard cell library.

wang2007d
¿Web? Processor support for decimal floating-point arithmetic, Liang-Kai Wang, ISBN 978-0-549-19463-7, 157pp, University of Wisconsin at Madison, 2007.
Abstract: Decimal data permeates society, as humans most commonly use base-ten numbers. Although microprocessors normally use base-two binary arithmetic to obtain faster execution times and simpler circuitry, binary numbers cannot represent decimal fractions exactly. This leads to large errors being accumulated after several decimal operations. Furthermore, binary floating-point arithmetic operations perform binary rounding instead of decimal rounding. Consequently, applications, such as financial, commercial, tax, and Internet-based applications, which are sensitive to representation and rounding errors, often require decimal arithmetic. Due to the increasing importance of and demand for decimal arithmetic, its formats and operations have been specified in the IEEE Draft Standard for Floating-point Arithmetic (IEEE P754).
   Most decimal applications use software routines and binary arithmetic to emulate decimal operations. Although this approach eliminates errors due to converting between binary and decimal numbers and provides decimal rounding to mirror manual calculations, it results in long latencies for numerically intensive commercial applications. This is because software emulation of decimal floating-point (DFP) arithmetic has significant overhead due to function calls, dealing with decimal formats, operand alignment, decimal rounding, and special case and exception handling.
   This dissertation investigates processor support for decimal floating-point arithmetic. It first reviews recent progress in decimal arithmetic, including decimal encodings, the IEEE P754 Draft Standard, and software packages, hardware designs, and benchmark suites for decimal arithmetic. Next, this dissertation presents novel arithmetic algorithms and hardware designs for basic DFP operations, including DFP addition, subtraction, division, square root, and others. Most of the hardware designs presented in this dissertation are the first published designs compliant with the IEEE P754 Draft Standard. Finally, to study the performance impact of DFP instructions and hardware, this dissertation presents the first publicly available benchmark suite for DFP arithmetic. This benchmark suite, along with instruction set extensions and a decimal-enhanced processor simulator, are used to demonstrate that providing fast hardware support for DFP operations leads to significant performance benefits to DFP-intensive applications.

watanabe2006
¿Web? Formal Design of Decimal Arithmetic Circuits Using Arithmetic Description Language, Yuki Watanabe, Naofumi Homma, Takafumi Aoki, and Tatsuo Higuchi, IEEE International Symposium on Intelligent Signal Processing and Communications, 2006 (ISPACS '06), ISBN 0-7803-9733-9, pp419–422, IEEE, December 2006.
Abstract: This paper presents a formal design of decimal arithmetic circuits using an arithmetic description language called ARITH. The use of ARITH makes possible (i) formal description of arithmetic algorithms including those using unconventional number systems, (ii) formal verification of described arithmetic algorithms, and (iii) translation of arithmetic algorithms to the equivalent HDL descriptions. In this paper, we demonstrate the potential of ARITH through an experimental design of binary coded decimal (BCD) arithmetic circuits.

you2006
¿Web? Dynamic decimal adder circuit design by using the carry look ahead, Younggap You, Yong Dae Kim, and Jong Hwa Choi, IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems, 3pp, IEEE Computer Society, April 2006.
Abstract: This paper presents a carry look ahead (CLA) circuitry design based on dynamic circuit aiming at delay reduction in addition of BCD coded decimal numbers. The performance of the proposed dynamic decimal adder is analyzed demonstrating its speed improvement. Timing simulation on the proposed decimal addition circuit employing 0.25µm CMOS technology yields the worst case delay of 622 ns.

yuen1977
¿Web? A New Representation for Decimal Numbers, C. K. Yuen, IEEE Transactions on Computers, Vol. 26 #12, pp1286–1288, IEEE, December 1977.
Abstract: A new representation for decimal numbers is proposed. It uses a mixture of positive and negative radixes to ensure that the maximum value of a four bit decimal digit is 9. This eliminates the more complex carry generation process required in BCD addition.

The 84 references listed on this page are selected from the bibliography on Decimal Arithmetic collected by Mike Cowlishaw. Please see the index page for more details and other categories.