Hardware

Bibliography of material on Decimal Arithmetic [Index]

Decimal Arithmetic: Hardware

agrawal1974
¿Web? Fast B. C. D. Multiplier, Dharma P. Agrawal, Electronics Letters, Vol. 10 #12, pp237–238, IEE, 13 June 1974.
Abstract: A fast b.c.d multiplier is proposed, based on obtaining the product of a 1-digit multiplicand and a 1-digit multiplier in a single row of adders. For high-speed operation, the carry-save technique, universally adopted for binary multipliers, is used.

allard1963
¿Web? Mixed Congruential Random Number Generators for Decimal Machines, J. L. Allard, A. R. Dobell, and T. E. Hull, Journal of the ACM, Vol. 10 #2, pp131–141, ACM Press, April 1963.
Abstract: Random number generators of the mixed eongruential type have recently been proposed. They appear to have some advantages over those of the multiplicative congruential type, but they have not been thoroughly tested. This paper summarizes the results of extensive testing of these generators which has been carried out on a decimal machine. Most results are for word length 10, and special attention is given to simple multipliers which give fast generators. But other word lengths and many other multipliers are considered. A variety of additive constants is also used. It turns out that these mixed generators, in contrast to the multiplicative ones, are not consistently good from a statistical point of view. The cases which are bad seem to belong to a well-defined class which, unfor unfortunately, includes most of the generators associated with the simple multipliers. However, a surprise result is that all generators associated with one of the simplest and fastest multipliers, namely 101, turn out to be consistently good for word lengths greater than seven digits. A final section of the paper suggests a simple theoretical explanation of these experimental results.

babu2005
¿Web? Design of a Reversible Binary Coded Decimal Adder by Using Reversible 4-bit Parallel Adder, Hafiz Md. Hasan Babu and Ahsan Raja Chowdhury, Proceedings of the 18th International Conference on VLSI Design (VLSID 2005), ISBN 0-7695-2264-5, pp255–260, IEEE, 2005.
Abstract: In this paper, we have proposed a design technique for the reversible circuit of binary coded decimal (BCD) adder. The proposed circuit has the ability to add two 4-bits binary variables and it transforms the addition into the appropriate BCD number with efficient error correcting modules where the operations are reversible. We also show that the proposed design technique generates the reversible BCD adder circuit with minimum number of gates as well as the minimum number of garbage outputs.

bashe1954
¿Web? The IBM Type 702, An Electronic Data Processing Machine for Business, C. J. Bashe, W. Buchholz, and N. Rochester, Journal of the ACM (JACM), Vol. 1 #4, pp149–169, ACM Press, October 1954.
Abstract: The main features of the IBM Electronic Data Processing Machine, Type 702, are discussed from the programmer’s point of view to illustrate how it was designed specifically to solve large accounting and statistical problems in business, industry, and government. The 702 exploits in one integrated system the high speed and storage capacity of magnetic tape, the accessibility of electrostatic memory supplemented by large auxiliary storage on magnetic drums, the flexibility of punched-card document input, the page printing output of modern accounting machines, and the technology of general-purpose, stored-program, electronic computers. The 702 is a serial machine with decimal arithmetic. Its serial nature provides several unusual logical features of great aid in programming accounting problems.

bata1971
¿Web? The Gamma 60: The computer that was ahead of its time, M. Bataille, Honeywell Computer Journal Vol. 5 #3, pp99–105, Honeywell, 1971.
Abstract: Prior to 1960 the Compagnie des Machines Bull (now Honeywell Bull) delivered the first large computer system with an architecture designed for multiprogramming. Many unique features of the Gamma 60 were forerunners of present system architecture concepts. This article revisits these concepts.

biswas2008
¿Web? A Novel Approach to Design BCD Adder and Carry Skip BCD Adder, Ashis Kumer Biswas, Md. Mahmudul Hasan, Moshaddek Hasan, Ahsan Raja Chowdhury, and Hafiz Md. Hasan Babu, Proceedings of the 21st International Conference on VLSI Design (VLSID '08), ISBN 0-7695-3083-4, pp566–571, IEEE Computer Society, January 2008.
Abstract: Reversible logic has become one of the most promising research areas in the past few decades and has found its applications in several technologies; such as low power CMOS, nanocomputing and optical computing. This paper presents improved and efficient reversible logic implementations for Binary Coded Decimal (BCD) adder as well as Carry Skip BCD adder. It has been shown that the modified designs outperform the existing ones in terms of number of gates, number of garbage output and delay.

biswas2008b
¿Web? Efficient approaches for designing reversible Binary Coded Decimal adders, Ashis Kumer Biswas, Md. Mahmudul Hasan, Ahsan Raja Chowdhury, and Hafiz Md. Hasan Babu, Microelectronics Journal, Vol. 39 #12, ISSN 0026-2692, pp1693–1703, Elsevier, December 2008.
Abstract: Reversible logic has become one of the most promising research areas in the past few decades and has found its applications in several technologies; such as low-power CMOS, nanocomputing and optical computing. This paper presents improved and efficient reversible logic implementations for Binary Coded Decimal (BCD) adder as well as Carry Skip BCD adder. It has been shown that the modified designs outperform the existing ones in terms of number of gates, number of garbage outputs, delay, and quantum cost. In order to show the efficiency of the proposed designs, lower bounds of the reversible BCD adders in terms of gates and garbage outputs are proposed as well.

bohl1987
¿Web? A Decimal Floating-Point Processor for Optimal Arithmetic, G. Bohlender and T. Teufel, Computer arithmetic: Scientific Computation and Programming Languages, ISBN 3-519-02448-9, pp31–58, B. G. Teubner Stuttgart, 1987.
Abstract: A floating-point processor for optimal arithmetic should perform scalar products with maximum accuracy in addition to the usual operations +, -, *, /. This means that scalar products have to be computed with an error of at most one bit of the least significant digit, even if cancellation of leading digits occurs. In order to avoid conversion errors during input and output of numerical data, the decimal number system should be chosen.
    The arithmetic processor BAP-SC performs these operations in a 64 bit floating-point format with 13 decimal digits in the mantissa. The prototype is built in bit-slice technology on wire-wrap boards. Interfaces have been developed [sic] for several busses and computers.
    The arithmetic processor is fully integrated in the programming language PASCAL-SC. It supports operations in higher numerical spaces and new numerical algorithms that compute verified results with error bounds.

burks1946
¿Web? Preliminary discussion of the logical design of an electronic computing instrument, Arthur W. Burks, Herman H. Goldstine, and John von Neumann, 42pp, Inst. for Advanced Study, Princeton, N. J., June 28, 1946.
Abstract: Inasmuch as the completed device will be a general-purpose computing machine it should contain certain main organs relating to arithmetic, memory-storage, control and connection with the human operator. It is intended that the machine be fully automatic in character, i.e. independent of the human operator after the computation starts...
Note: Reprinted in von Neumann’s Collected Works, Vol. 5, A. H. Taub, Ed. (Pergamon, London, 1963), pp 34-79, and also in Computer Structures: Reading and Examples, Bell & Newell, McGraw-Hill Inc., 1971. Now widely available on the Internet.
    Contract W-36-034-ORD-H81. R&D Service, Ordnance Department, US Army and Institute for Advanced Study, Princeton

burro1964
¿Web? Burroughs B5500 Information Processing Systems Reference Manual, Burroughs Corporation, 224pp, Burroughs Corporation, Detroit, Michigan, 1964.
Abstract: This reference manual describes the hardware characteristics of the Burroughs B 5500 Information Processing System by presenting detailed information concerning the functional operation of the entire system. The B 5500 is a large-scale, high-speed, solid-state computer which represents a departure from the conventional computer system concept. It is a problem language oriented system rather than the conventional hardware oriented system. Because of the design concept of the B 5500, there exists a strong interdependence between the hardware and the Master Control Program which directs the system. The material presented herein pertains only to the hardware considerations, whereas the Master Control Program is discussed under separate cover.

busa2001
¿Web? The IBM z900 Decimal Arithmetic Unit, Fadi Y. Busaba, Christopher A. Krygowski, Wen H. Li, Eric M. Schwarz, and Steven R. Carlough, Conference Record of the 35th Asilomar Conference on Signals, Systems and Computers, Vol. 2, ISBN 0 7803 7147 X, pp1335–1339, IEEE, Nov. 2001.
Abstract: As the cost for adding function to a processor continues to decline, processor designs are including many additional features. An example of this trend is the appearance of graphics engines and compression engines on midrange and even low end microprocessors. One area that has the potential to capture chip real estate is the decimal arithmetic engine because of its importance in financial and business applications. Studies show that 55% of the numeric data stored on commercial databases are in decimal format. Although decimal arithmetic is supported in many software languages it is not yet available on many microprocessors. This paper details the decimal arithmetic engine in the recently announced z900 microprocessor.
Note: IEEE cat #01ch37256.

busa2004
¿Web? The Design of the Fixed Point Unit for the z990 Microprocessor, Fadi Y. Busaba, Timothy Slegel, Steven R. Carlough, Christopher A. Krygowski, and John G Rell, Proceedings of the 14th ACM Great Lakes symposium on VLSI, ISBN 1-58113-853-9, pp364 – 367, ACM Press, 2004.
Abstract: The paper presents the design of the Fixed Point Unit (FXU) for the IBM eServer z990 microprocessor (announced in 2Q ’03) that runs at 1.2 GHz. The FXU is capable of executing two Register-Memory instructions including arithmetic instructions and a branch instruction in a single cycle. The FXU executes a total of 369 instructions that operate on variable size operands (1 to 256 bytes). The instruction set include decimal arithmetic with multiplies and divides, binary arithmetic, shifts and rotates, loads/stores, branches, long moves, logical operations, convert instructions, and other special instructions. The FXU consists of 64-bit dataflow stack that is custom designed and a control stack that is synthesized. The current FXU is the first superscalar design for the CMOS z-series machines, has a new improved decimal unit, and has for the first time a 16x64 bit binary multiplier.

castell2006
¿Web? A 64-bit Decimal Floating-Point Comparator, Ivan D. Castellanos and James E. Stine, IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06), pp138–144, IEEE, 2006.
Abstract: Decimal arithmetic is growing in importance as scientific studies reveal that current financial and commercial applications spend a high percentage overhead in this type of calculations. Typically, software is utilized to emulate decimal floating point arithmetic in these applications. On the other hand, functional units that employ decimal floating point hardware can improve performance by two or three orders of magnitude. This paper presents the design and implementation of a novel decimal floating-point comparator compliant with the current draft revision of the IEEE-754 Standard for floating-point arithmetic. It utilizes a novel BCD magnitude comparator with logarithmic delay and it supports 64-bit decimal floating-point numbers. Area and delay results are examined for an implementation in TSMC SCN6M SCMOS technology.

castell2008
¿Web? Compressor trees for decimal partial product reduction, Ivan D. Castellanos and James E. Stine, Proceedings of the 18th ACM Great Lakes symposium on VLSI, ISBN 978-1-59593-999-9, pp107–110, ACM Press, 2008.
Abstract: Decimal multiplication has grown in interest due to the recent announcement of new IEEE 754R standards and the availability of high-speed decimal computation hardware. Prior research enabled partial products to be coded more efficiently for their use in radix 10 architectures. This paper clarifies previous techniques for partial product reduction using carry-save adders and presents a new 4:2 compressor structure. This new structure improves performance at the expense of more gates, however, regularity is introduced into the circuit to promote implementations in Very Large Scale Integration (VLSI) Designs. Results are presented and compared for several designs using a TSMC SCN6M 0.18 µm feature size.

cody1984
¿Web? A Proposed Radix- and Word-length-independent Standard for Floating-point Arithmetic, W. J. Cody, J. T. Coonen, D. M. Gay, K. Hanson, D. Hough, W. Kahan, R. Karpinski, J. Palmer, F. N. Ris, and D. Stevenson, IEEE Micro magazine, Vol. 4 #4, pp86–100, IEEE, August 1984.
Abstract: This article places [Draft 1.0 of IEEE 854] before the public for the first time. ... This article also includes material that describes how decisions were reached in preparing the P854 draft and explains how to overcome some of the implementation problems.
Note: Reprinted in ACM SIGNUM, Vol. 20, #1, pp35-51, 1985.

cohen1983
¿Web? CADAC: A Controlled-Precision Decimal Arithmetic Unit, Marty S. Cohen, T. E. Hull, and V. Carl Hamacher, IEEE Transactions on Computers, Vol. 32 #4, pp370–377, IEEE, April 1983.
Abstract: This paper describes the design of an arithmetic unit called CADAC (clean arithmetic with decimal base and controlled precision). Programming language specifications for carrying out “ideal” floating-point arithmetic are described first. These specifications include detailed requirements for dynamic precision control and exception handling, along with both complex and interval arithmetic at the level of a programming language such as Fortran or PL/I.
    CADAC is an arithmetic unit which performs the four floating-point operations add/subtract/multiply/divide on decimal numbers in such a way as to support all the language requirements efficiently. A three-level pipeline is used to overlap two-digit-at-a-time serial processing of the partial products/remainders. Although the logic design is relatively complex, the performance is efficient, and the advantages gained by implementing programmer-controlled precision directly in the hardware are significant.

couleur1958
¿Web? BIDEC – A Binary-to-Decimal or Decimal-to-Binary Converter, J. F. Couleur, IRE Transactions on Electronic Computers, Vol. EC-7, pp313–316, IRE, 1958.
Abstract: Simple, high-speed devices to convert binary, binary coded octal, or Gray code numbers to binary coded decimal numbers or vice versa is described. Circuitry required is four shift register stages per decimal digit plus one 30-diode network per decimal digit. In simple form the conversion requires two operations per binary bit but is theoretically capable of working at one operation per bit.

cowlis2002
¿Web? Densely Packed Decimal Encoding, Michael F. Cowlishaw, IEE Proceedings – Computers and Digital Techniques, Vol. 149 #3, ISSN 1350-2387, pp102–104, IEE, London, May 2002.
Abstract: Chen-Ho encoding is a lossless compression of three Binary Coded Decimal digits into 10 bits using an algorithm which can be applied or reversed using only simple Boolean operations. An improvement to the encoding which has the same advantages but is not limited to multiples of three digits is described. The new encoding allows arbitrary-length decimal numbers to be coded efficiently while keeping decimal digit boundaries accessible. This in turn permits efficient decimal arithmetic and makes the best use of available resources such as storage or hardware registers.

dadda2007
¿Web? Multioperand Parallel Decimal Adder: A Mixed Binary and BCD Approach, Luigi Dadda, IEEE Transactions on Computers, Vol. 56 #10, ISSN 0018-9340, pp1320–1328, IEEE, October 2007.
Abstract: Decimal arithmetic has been in recent years revived due to the large amount of data in commercial applications. We consider the problem of Multi Operand Parallel Decimal Addition with an approach that uses binary arithmetic, suggested by the adoption of BCD numbers. This involves corrections in order to obtain the BCD result, or a binary to decimal conversion. We adopt the latter approach, particularly efficient for a large number of addends. Conversion requires a relatively small area and can afford fast operation. The BD conversion, moreover, allows an easy alignment of the sums of adjacent columns. We treat the design of BCD digit adders using fast carry free adders and the conversion problem through a known parallel scheme using elementary conversion cells. Spreadsheets have been developed for adding several BCD digits and for simulating the binary to decimal conversion as design tool.

duale2007
URL
¿Web? Decimal floating-point in z9: An implementation and testing perspective, A. Y. Duale, M. H. Decker, H.-G. Zipperer, M Aharoni, and T. J. Bohizic, IBM Journal of Research and Development, Vol. 51 #1/2, ISSN 0018-8646, pp217–227, IBM, January 2007.
Abstract: Although decimal arithmetic is widely used in commercial and financial applications, the related computations are handled in software. As a result, applications that use decimal data may experience performance degradations. Use of the newly defined decimal floating-point (DFP) format instead of binary floating-point is expected to significantly improve the performance of such applications. System z9� is the first IBM machine to support the DFP instructions. We present an overview of this implementation and provide some measurement of the performance gained using hardware assists. Various tools and techniques employed for the DFP verification on unit, element, and system levels are presented in detail. Several groups within IBM collaborated on the verification of the new DFP facility, using a common reference model to predict DFP results.

eisen2007
¿Web? IBM POWER6 accelerators: VMX and DFU, L. Eisen, J. W. Ward III, H.-W. Tast, N. M�ding, J. Leenstra, S. M. Mueller, C. Jacobi, J. Preiss, E. M. Schwarz, and S. R. Carlough, IBM Journal of Research and Development Vol. 51 #6, ISSN 0018-8646, pp663–683, IBM, November 2007.
Abstract: The IBM POWER6 microprocessor core includes two accelerators for increasing performance of specific workloads. The vector multimedia extension (VMX) provides a vector acceleration of graphic and scientific workloads. It provides single instructions that work on multiple data elements. The instructions separate a 128-bit vector into different components that are operated on concurrently. The decimal floating-point unit (DFU) provides acceleration of commercial workloads, more specifically, financial transactions. It provides a new number system that performs implicit rounding to decimal radix points, a feature essential to monetary transactions. The IBM POWER processor instruction set is substantially expanded with the addition of these two accelerators. The VMX architecture contains 176 instructions, while the DFU architecture adds 54 instructions to the base architecture. The IEEE 754R Binary Floating-Point Arithmetic Standard defines decimal floating-point formats, and the POWER6 processor—on which a substantial amount of area has been devoted to increasing performance of both scientific and commercial workloads—is the first commercial hardware implementation of this format.

erle2002
¿Web? Potential Speedup with Decimal Floating-Point Hardware, Mark A Erle, Michael J Schulte, and J G Linebarger, Proceedings of the Thirty Sixth Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, California, pp1073–1077, IEEE Press, November 2002.
Abstract: This paper address the potential speedup achieved by using decimal floating-point hardware, instead of software routines, on a high-performance super-scalar architecture. Software routines were written to performag decimal addition, subtraction, multiplication, and division. Cycle counts were then measured for each instruction using the Simplescalar simulator. After this, new hardware algorithms were developed, existing algorithms were analyzed, and cycle counts were estimated for the same set of instructions using specialized decimal floating-point hardware. This data was then used to show the potential speedup obtained for programs with different instruction mixes and a recently developed benchmark.

erle2003
¿Web? Decimal Multiplication Via Carry-Save Addition, Mark A Erle and Michael J Schulte, Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors, the Hague, Netherlands,, pp348–358, IEEE Computer Society Press, June 2003.
Abstract: Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. This paper presents two novel designs for fixed-point decimal multiplication that utilize decimal carry-save addition to reduce the critical path delay. First, a multiplier that stores a reduced number of multiplicand multiples and uses decimal carry-save addition in the iterative portion of the design is presented. Then, a second multiplier design is proposed with several notable improvements including fast generation of multiplicand multiples that do not need to be stored, the use of decimal (4:2) compressors, and a simplified decimal carry-propagate addition to produce the final product. When multiplying two n-digit operands to produce a 2n-digit product, the improved multiplier design has a worst-case latency of n + 4 cycles and an initiation interval of n + 1 cycles. Three data-dependent optimizations, which help reduce the multipliers’ average latency, are also described. The multipliers presented can be extended to support decimal floating-point multiplication.

erle2005
¿Web? Decimal Multiplication With Efficient Partial Product Generation, Mark A Erle, Eric Schwarz, and Michael J Schulte, Proceedings of the 17th IEEE Symposium on Computer Arithmetic, ISBN 0-7695-2366-8, pp21–28, IEEE, June 2005.
Abstract: Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. This paper presents a novel design for fixed-point decimal multiplication that utilizes a simple recoding scheme to produce signed-magnitude representations of the operands thereby greatly simplifying the process of generating partial products for each multiplier digit. The partial products are generated using a digit-by-digit multiplier on a word-by-digit basis, first in a signed-digit form with two digits per position, and then combined via a combinational circuit. As the signed-digit partial products are developed one at a time while traversing the recoded multiplier operand from the least significant digit to the most significant digit, each partial product is added along with the accumulated sum of previous partial products via a signed-digit adder. This work is significantly different from other work employing digit-by-digit multipliers due to the efficiency gained by restricting the range of digits throughout the multiplication process.

erle2007
URL
¿Web? Decimal Floating-Point Multiplication Via Carry-Save Addition, Mark A. Erle, Michael J. Schulte, and Brian J. Hickmann, Proceedings of the 18th IEEE Symposium on Computer Arithmetic, ISBN 0-7695-2854-6, ISBN 978-0-7695-2854-0, pp46–55, IEEE, June 2007.
Abstract: Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. This paper presents the design of a decimal floating-point multiplier that complies with specifications for decimal multiplication given in the draft revision of the IEEE 754 Standard for Floating-point Arithmetic (IEEE 754R). This multiplier extends a previously published decimal fixedpoint multiplier design by adding several features including exponent generation, sticky bit generation, shifting of the intermediate product, rounding, and exception detection and handling. The core of the decimal multiplication algorithm is an iterative scheme of partial product accumulation employing decimal carry-save addition to reduce the critical path delay. Novel features of the proposed multiplier include support for decimal floating-point numbers, on-thefly generation of the sticky bit, early estimation of the shift amount, and efficient decimal rounding. Area and delay estimates are provided for a verified Verilog register transfer level model of the multiplier.

erle2008
URL
¿Web? Algorithms and Hardware Designs for Decimal Multiplication, Mark A. Erle, 217pp, Lehigh University, November 2008.
Abstract: Although a preponderance of business data is in decimal form, virtually all floating-point arithmetic units on today’s general-purpose microprocessors are based on the binary number system. Higher performance, less circuitry, and better overall error characteristics are the main reasons why binary floating-point hardware (BFP) is chosen over decimal floating-point (DFP) hardware. However, the binary number system cannot precisely represent many common decimal values. Further, although BFP arithmetic is well-suited for the scientific community, it is quite different from manual calculation norms and does not meet many legal requirements.
   Due to the shortcomings of BFP arithmetic, many applications involving fractional decimal data are forced to perform their arithmetic either entirely in software or with a combination of software and decimal fixed-point hardware. Providing DFP hardware has the potential to dramatically improve the performance of such applications. Only recently has a large microprocessor manufacturer begun providing systems with DFP hardware. With available die area continually increasing, dedicated DFP hardware implementations are likely to be offered by other microprocessor manufacturers.
   This dissertation discusses the motivation for decimal computer arithmetic, a brief history of this arithmetic, and relevant software and processor support for a variety of decimal arithmetic functions. As the context of the research is the IEEE Standard for Floating-point Arithmetic (IEEE 754-2008) and two-state transistor technology, descriptions of the standard and various decimal digit encodings are described.
   The research presented investigates algorithms and hardware support for decimal multiplication, with particular emphasis on DFP multiplication. Both iterative and parallel implementations are presented and discussed. Novel ideas are advanced such as the use of decimal counters and compressors and the support of IEEE 754-2008 floating-point, including early estimation of the shift amount, in-line exception handling, on-the-fly sticky bit generation, and efficient decimal rounding. The iterative and parallel, decimal multiplier designs are compared and contrasted in terms of their latency, throughput, area, delay, and usage.
   The culmination of this research is the design and comparison of an iterative DFP multiplier with a parallel DFP multiplier. The iterative DFP multiplier is significantly smaller and may achieve a higher practical frequency of operation than the parallel DFP multiplier. Thus, in situations where the area available for DFP is an important design constraint, the iterative DFP multiplier may be an attractive implementation. However, the parallel DFP multiplier has less latency for a single multiply operation and is able to produce a new result every cycle. As for power considerations, the fewer overall devices in the iterative multiplier, and more importantly the fewer storage elements, should result in less leakage. This benefit is mitigated by its higher latency and lower throughput.
   The proposed implementations are suitable for general-purpose, server, and mainframe microprocessor designs. Depending on the demand for DFP in human-centric applications, this research may be employed in the application-specific integrated circuits (ASICs) market.
Note: Available at speleotrove.com.

glads1991
¿Web? A method of designing a decimal arithmetic processor, M. A. Gladshtein, Automatic Control and Computer Sciences, Vol. 25 #6, pp51–56, 1991.
Abstract: The advantages and drawbacks of binary numeric coding in digital computers have been considered. This type of coding has been shown ineffective in processing large data arrays especially when represented in the floating-point form. Also, the low efficiency of conventionally employed decimal computational procedures using the so-called corrections has been noted. It has been proposed, in designing digital computers, to renounce the principle of binary computations in favor of decimal operations on the basis of stored addition and multiplication tables using binary-decimal numeric coding. A version of circuit design for a decimal processor, algorithms and microprograms for addition and multiplication operations have been described. Advantages inherent in the method proposed have been analyzed.
Note: Translated from Avtomatika i Vychislitel’naya Tekhnika UDC 681.3.48.

golds1946
¿Web? The Electronic Numerical Integrator and Computer (ENIAC), H. H. Goldstine and Adele Goldstine, IEEE Annals of the History of Computing, Vol. 18 #1, pp10–16, IEEE, 1996.
Abstract: It is our purpose in the succeeding pages to give a brief description of the ENIAC and an indication of the kinds of problems for which it can be used. This general purpose electronic computing machine was recently made public by the Army Ordnance Department for which it was developed by the Moore School of Electrical Engineering. The machine was developed primarily for the purpose of calculating firing tablcs for the armed forces. Its design is, however, sufficiently general to permit the solution of a large class of numerical problems which could hardly be attempted by more conventional computing tools.
    In order easily to obtain sufficient accuracy for scientific computations, the ENIAC was designed as a digital device. The equipment normally handles signed 10-digit numbers expressed in the decimal system. It is, however, so constructed that operations with as many as 20 digits are possible.
    The machine is automatically sequenced in the sense that all instructions needed to carry out a computation are given to it before the computation commences. It will be seen below how these instructions are given to the machine.
Note: Reprinted from Mathematical Tables and Other Aids to Computation, 1946.

gord1998
URL
¿Web? A Calculated Look at Fixed-Point Arithmetic, Robert Gordon, Embedded Systems Programming, Vol. 11 #4, pp72–78, Miller Freeman, Inc, April 1998.
Abstract: This article explores the subject of fixed-point numbers and presents techniques you can use to implement efficient, fixed-precision number applications.

gray2003
¿Web? Before the B5000: Burroughs Computers, 1951-1963, George T. Gray and Ronald Q. Smith, IEEE Annals of the History of Computing, Vol. 25 #2, pp50–61, IEEE, April-June 2003.
Abstract: Like many companies entering the computer industry, Burroughs began by working on US government contracts. Once sufficient expertise had been gained, the company entered the general purpose computer market. The Datatron computer, obtained through the ElectroData Corporation acquisition, was a modest success in the late 1950s; however, pioneering work on transistor computers for military contracts was not immediately transferred to the commercial marketplace.

hack2004
URL
¿Web? On Intermediate Precision Required for Correctly-Rounding Decimal-to-Binary Floating-Point Conversion., Michel Hack, Proceedings of RNC6 (6th conference on Real Numbers and Computers), URL: http://www.informatik.uni-trier.de/Reports/TR-08-2004/rnc6_10_hack.pdf, 22pp, University of Trier, November 2004.
Abstract: The algorithms developed ten years ago in preparation for IBM�s support of IEEE Floating-Point on its mainframe S/390 processors use an overly conservative intermediate precision to guarantee correctly-rounded results across the entire exponent range. Here we study the minimal requirement for both bounded and unbounded precision on the decimal side (converting to machine precision on the binary side). An interesting new theorem on Continued Fraction expansions is offered, as well as an open problem on the growth of partial quotients for ratios of powers of two and five.

hamilton1954
¿Web? The IBM Magnetic Drum Calculator Type 650, F. E. Hamilton and E. C. Kubie, Journal of the ACM, Vol. 1 #1, pp13–20, ACM Press, January 1954.
Abstract: The IBM Magnetic Drum Calculator Type 650 is an electronic calculator intermediate in speed, capacity and cost. It takes a logical position between the IBM Card Programmed Electronic Calculator and the IBM Electronic Data Processing Machines Type 701. It is a more powerful computing tool as required by those who have “outgrown” the Card Programmed Electronic Calculator. It is also a machine which may be used economically by those who are not as yet ready for a large scale computer such as the 701. It will serve not only to perform their required computing tasks, but it will also result in gaining valuable experience for later use of large scale equipment. The Magnetic Drum Calculator, through its stored program control, comprehensive order list, punched card input-output, self-checking and moderate memory capacity, gains the flexibility required of a computer which is to serve in both the commercial and scientific computing fields...

hickmann2007
¿Web? A Parallel IEEE P754 Decimal Floating-Point Multiplier, Brian J. Hickmann, Andrew Krioukov, Michael J. Schulte, and Mark A. Erle, Proceedings of the IEEE International Conference on Computer Design 2007, pp296–303, IEEE, October 2007.
Abstract: Decimal floating-point multiplication is important in many commercial applications including banking, tax calculation, currency conversion, and other financial areas. This paper presents a fully parallel decimal floating-point multiplier compliant with the recent draft of the IEEE P754 Standard for Floating-point Arithmetic (IEEE P754). The novelty of the design is that it is the first parallel decimal floating-point multiplier offering low latency and high throughput. This design is based on a previously published parallel fixed-point decimal multiplier which uses alternate decimal digit encodings to reduce area and delay. The fixed-point design is extended to support floating-point multiplication by adding several components including exponent generation, rounding, shifting, and exception handling. Area and delay estimates are presented that show a significant latency and throughput improvement with a substantial increase in area as compared to the only published IEEE P754 compliant sequential floating-point multiplier. To the best of our knowledge, this is the first publication to present a fully parallel decimal floating-point multiplier that complies with IEEE P754.

hull1991
¿Web? Specifications for a Variable-Precision Arithmetic Coprocessor, T. E. Hull, M. S. Cohen, and C. B. Hall, Proceedings. 10th Symposium on Computer Arithmetic, ISBN 0-8186-9151-4, pp127–131, IEEE, 1991.
Abstract: The authors have been developing a programming system intended to be especially convenient for scientific computing. Its main features are variable precision (decimal) floating-point arithmetic and convenient exception handling. The software implementation of the system has evolved over a number of years, and a partial hardware implementation of the arithmetic itself was constructed and used during the early stages of the project. Based on this experience, the authors have developed a set of specifications for an arithmetic coprocessor to support such a system. The main purpose of this paper is to describe these specifications. An outline of the language features and how they can be used is also provided, to help justify our particular choice of coprocessor specifications.

ibm1998
URL
¿Web? Decimal Arithmetic Instructions, IBM, ESA/390 Principles of Operation, Chapter 8, IBM, 1998.
Abstract: The decimal instructions of this chapter perform arithmetic and editing operations on decimal data. Additional operations on decimal data are provided by several of the instructions in Chapter 7, “General Instructions”. Decimal operands always reside in storage, and all decimal instructions use the SS instruction format. Decimal operands occupy storage fields that can start on any byte boundary.

iguchi2007a
URL
¿Web? On Designs of Radix Converters using Arithmetic Decompositions, Yukihiro Iguchi, Tsutomu Sasao, and Munehiro Matsuura, Proceedings of ISMVL-2007, Oslo, Norway (CD-ROM), 8pp, IEEE, May 2007.
Abstract: In digital signal processing, radixes other than two are often used for high-speed computation. In the computation for finance, decimal numbers are used instead of binary numbers. In such cases, radix converters are necessary. This paper considers design methods for binary to q-nary converters. It introduces a new design technique based on weighted-sum (WS) functions. The method computes a WS function for each digit by an LUT cascade and a binary adder, then adds adjacent digits with q-nary adders. A 16-bit binary to decimal converter is designed to show the method.

iguchi2007b
URL
¿Web? Design Methods of Radix Converters using Arithmetic Decompositions, Yukihiro Iguchi, Tsutomu Sasao, and Munehiro Matsuura, Institute of Electronics, Information and Communication Engineers, Transactions on Information and Systems, Vol. E90-D #6, pp905–914, IEICE, June 2007.
Abstract: In arithmetic circuits for digital signal processing, radixes other than two are often used to make circuits faster. In such cases, radix converters are necessary. However, in general, radix converters tend to be complex. This paper considers design methods for p-nary to binary converters. First, it considers Look-Up Table (LUT) cascade realizations. Then, it introduces a new design technique called arithmetic decomposition by using LUTs and adders. Finally, it compares the amount of hardware and performance of radix converters implemented by FPGAs. 12-digit ternary to binary converters on Cyclone II FPGAs designed by the proposed method are faster than ones by conventional methods.

james2007
¿Web? Quick Addition of Decimals Using Reversible Conservative Logic, Rekha K. James, Shahana T. K., K. Poulose Jacob, and Sreela Sasi, 15th International Conference on Advanced Computing and Communications (ADCOM 2007),, ISBN 0-7695-3059-1, pp191–196, IEEE Computer Society, December 2007.
Abstract: In recent years, reversible logic has emerged as one of the most important approaches for power optimization with its application in low power CMOS, nanotechnology and quantum computing. This research proposes quick addition of decimals (QAD) suitable for multi-digit BCD addition, using reversible conservative logic. The design makes use of reversible fault tolerant Fredkin gates only. The implementation strategy is to reduce the number of levels of delay there by increasing the speed, which is the most important factor for high speed circuits.

jimeno2008
¿Web? A BCD-based architecture for fast coordinate rotation, Antonio Jimeno, Higinio Mora, Jose L. Sanchez, and Francisco Pujol, Journal of Systems Architecture: the EUROMICRO Journal, Vol. 54 #8, ISSN 1383-7621, pp829–840, Elsevier, August 2008.
Abstract: Although radix 10 based arithmetic has been gaining renewed importance over the last few years, decimal systems are not efficient enough and techniques are still under development. In this paper, an improvement of the CORDIC (coordinate rotation digital computer) method for decimal representation is proposed and applied to produce fast rotations. The algorithm uses BCD operands as inputs, combining the advantages of both decimal and binary systems. The result is a reduction of 50% in the number of iterations if compared with the original Decimal CORDIC method. Finally, we present a hardware architecture useful to produce BCD coordinates rotations accurately and fast, and different experiments demonstrating the advantages of the new method are shown. A reduction of 75% in a single stage delay is obtained, whereas the circuit area just increases in about 5%.

johan1980
¿Web? Decimal Shifting for an Exact Floating Point Representation, J. D. Johannes, C. Dennis Pegden, and F. E. Petry, Computers and Electrical Engineering, Vol. 7 #3, pp149–155, Elsevier, September 1980.
Abstract: A floating point representation which permits exact conversion of decimal numbers is discussed. This requires the exponent to represent a power of ten, and thus decimal shifts of the mantissa are needed. A specialized design is analyzed for the problem of division by ten, which is needed for decimal shifting.

johnst2001
¿Web? Architecture and Algorithms for Processing Non-binary Floating Point Radices, Paul Johnstone and Frederick E. Petry, unpublished paper, 39pp, pers. comm., July 2001.
Abstract: Recent studies have proposed several non-binary floating point representations which possess most of the storage and algorithmic efficiencies of traditional binary systems with no sacrifice of precision and only modest reductions in range. Such systems possess inherent advantages in that they employ less complicated conversion algorithms and are less prone to errors in representation. Additionally, non-binary systems tend to produce more precise arithmetic results in that common problem of truncation of an infinitely repeating quotient occurs with a lesser frequency.
    However, as has been previously observed, traditional binary floating representations are most efficiently adapted to the prevailing choices of technology and system architecture. Previous research has left undone the quantification and evaluation of the algorithms and componentry necessary to effect the proposed representations in a fully realized system. We consider in this study the expected impact of adding the capacity to process one of the proposed non-binary radix representations within a conventional computer system. Since decimal representations are clearly the overwhelming impetus for these studies, discussion will focus solely on base 10 systems. Examination of implementation issues are directed toward the following areas: the implementation of floating point representations in contemporary computer architectures, the design of any extensions to such systems, the effects on system complexity and cost, and, finally, resulting algorithmic revisions.

jones1962
¿Web? Floating Point Feature On The IBM Type 1620, F. B. Jones and A. W. Wymore, IBM Technical Disclosure Bulletin, 05-62, pp43–46, IBM, May 1962.
Abstract: In the type 1620 automatic floating point operations, a floating point number is a field consisting of a variable length mantissa and a two digit exponent. The exponent is in the two low order positions of the field, and the mantissa is in the remaining high order positions, |M.....M|EE.
    The most significant digit positions are marked by flags and the algebraic signs are marked by flags over the least significant digit positions. The exponent is established on the premise that the mantissa is less than 1.0 and equal to or greater than 0.1, and has a range of -99 to +99. The smallest positive quantity that can be represented is thus 00.... 099. The mantissa may have from two to one hundred digits. ...

kaivani2006
¿Web? Reversible Implementation of Densely-Packed-Decimal Converter to and from Binary-Coded-Decimal Format Using in IEEE-754R, A. Kaivani, A. Zaker Alhosseini, S. Gorgin, and M. Fazlali, 9th International Conference on Information Technology (ICIT'06), pp273–276, IEEE, December 2006.
Abstract: The Binary Coded Decimal (BCD) encoding has always dominated the decimal arithmetic algorithms and their hardware implementation. Due to importance of decimal arithmetic, the decimal format defined in lEEE 754 floating point standard has been revisited. It uses Densely Packed Decimal (DPD) encoding to store significand part of a decimal floating point number. Furthermore in recent years reversible logic has attracted the attention of engineers for designing low power CMOS circuits, as it is not possible to realize quantum compufing withouf reversible logic implementation. This paper derives the reversible implementation of DPD converter to and from conventional BCD format using in IEEE 754R.

kenney2004a
¿Web? Multioperand Decimal Addition (extended version), Robert D Kenney and Michael J Schulte, Proceedings of the IEEE Computer Society Annual Symposium on VLSI, Lafayette, LA, February, 2004., 10pp, IEEE, February 2004.
Abstract: This paper introduces and analyzes four techniques for performing fast decimal addition on multiple binary coded decimal (BCD) operands. Three of the techniques speculate BCD correction values and use chaining to correct intermediate results. The first speculates over one addition. The second speculates over two additions. The third employs multiple instances of the second technique in parallel and then merges the results. The fourth technique uses a binary carry-save adder tree and produces a binary sum. Combinational logic is then used to correct the sum and determine the carry into the next digit. Multioperand adder designs are constructed and synthesized for four to sixteen input operands. Analyses are performed on the synthesis results and the merits of each technique are discussed. Finally, these techniques are compared to previous attempts made at speeding up decimal addition.

kenney2004b
¿Web? High-Frequency Decimal Multiplier, Robert D Kenney, Michael J Schulte, and Mark A. Erle, Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors, ISBN 0 7695 2231 9, pp26–29, IEEE, October 2004.
Abstract: Decimal arithmetic is regaining popularity in the computing community due to the growing importance of commercial, financial, and Internet-based applications, which process decimal data. This paper presents an iterative decimal multiplier, which operates at high clock frequencies and scales well to large operand sizes. The multiplier uses a new decimal representation for intermediate products, which allows for a very fast two- stage iterative multiplier design. Decimal multipliers, which are synthesized using a 0.11 micron CMOS standard cell library, operate at clock frequencies close to 2 GHz. The latency of the proposed design to multiply two n-digit BCD operands is (n + 8) cycles with a new multiplication able to begin every (n + 1) cycles.

kenney2005
¿Web? High-speed multioperand decimal adders, R.D. Kenney and M. J. Schulte, IEEE Transactions on Computers, Vol. 54 #8, ISSN 0018-9340, pp953–963, IEEE, August 2005.
Abstract: There is increasing interest in hardware support for decimal arithmetic as a result of recent growth in commercial, financial, and Internet-based applications. Consequently, new specifications for decimal floating-point arithmetic have been added to the draft revision of the IEEE-754 Standard for Floating-Point Arithmetic. This paper introduces and analyzes three techniques for performing fast decimal addition on multiple binary coded decimal (BCD) operands. Two of the techniques speculate BCD correction values and correct intermediate results while adding the input operands. The first speculates over one addition. The second speculates over two additions. The third technique uses a binary carry-save adder tree and produces a binary sum. Combinational logic is then used to correct the sum and determine the carry into the next more significant digit. Multioperand adder designs are constructed and synthesized for four to 16 input operands. Analyses are performed on the synthesis results and the merits of each technique are discussed. Finally, these techniques are compared to several previous techniques for high-speed decimal addition.

kleinsteiber1980
¿Web? IBM 4341 hardware/microcode trade-off decisions, James R. Kleinsteiber, MICRO 13: Proceedings of the 13th annual workshop on Microprogramming, pp190–192, ACM Press, December 1980.
Abstract: The design of IBM’s 4341 Processor, as with other processors, involved many cost/performance tradeoffs. The designer is continually under pressure to increase processor speed without increasing cost or to decrease processor cost without decreasing performance. This paper will examine some of the engineering decisions that were made in the attempt to make the 4341 a high-performing yet low cost processor. These decisions include searching for, or developing, algorithms that make the best use of hardware properties, such as data path width, arithmetic/logical operations and special functions. Functions were sought such that a small amount of added hardware would go a long way towards improving system performance. Hardware designers, microcoders and performance analysis people worked together to implement instructions, functions and algorithms with the proper mixture of hardware functions and microcode in order to build a viable processor. Some specific functions will be covered to examine a few of the decisions. The TEST UNDER MASK performance problem will be discussed with its resulting implementation decision. The method of using EXCLUSIVE OR to clear storage and the resulting algorithm design will be shown. Other topics to be discussed include multiple hardware functions and the resulting effect on floating point, fixed point and decimal multiply; the divide function and its effect on floating point and fixed point divide; and the effect of an 8-byte data path for decimal arithmetic.
Note: Also published in December 1980 SIGMICRO Newsletter Volume 11 Issue 3-4

klerer1967
¿Web? Chapt. 1.4 Computer Characteristics Table, Melvin Klerer et al, Digital Computer User's Handbook, 67pp, McGraw-Hill, NY, 1967.
Abstract: Section I: General-purpose Solid-state Computers Manufactured in the United States and Designed for a Wide Variety of Business and Scientific Applications
   Section II: Systems Manufactured in the United States with General-purpose Capabilities but Used Principally in Process Control, Message Switching, and Other Specialized Applications
   Section III: General-purpose Computers Manufactured in Countries Other Than the United States
   Section IV: Vacuum-tube Computers No Longer Manufactured but Still in Use
   Section V: Chronological Listing of Vacuum-tube and Solid-state Computers Manufactured in the United States and Installed between 1951 and 1965

knuth1986
¿Web? The IBM 650: An Appreciation from the Field, Donald E. Knuth, IEEE Annals of the History of Computing, Vol. 8 #1, pp50–55, IEEE, January-March 1986.
Abstract: I suppose it was natural for a person like me to fall in love with his first computer. But there was something special about the IBM 650, something that has provided the inspiration for much of my life�s work. Somehow this machine was powerful in spite of its severe limitations. Somehow it was friendly in spite of its primitive man-machine interface...

lake1962
¿Web? Hardware Conversion of Decimal and Binary Numbers, G. T. Lake, Communications of the ACM, Vol.5 #9, pp468–469, ACM Press, September 1962.

lang2006
¿Web? A Radix-10 Combinational Multiplier, Tomás Lang and Alberto Nannarelli, Proceedings of 40th Asilomar Conference on Signals, Systems, and Computers, pp313–317, IEEE, October 2006.
Abstract: In this work, we present a combinational decimal multiply unit which can be pipelined to reach the desired throughput. With respect to previous implementations of decimal multiplication, the proposed unit is combinational (parallel) and not sequential, has a simpler recoding of the operands which reduces the number of partial product precomputations and uses counters to eliminate the need of the decimal equivalent of a 4:2 adder. The results of the implementation show that the combinational decimal multiplier offers a good compromise between latency and area when compared to other decimal multiply units and to binary double-precision multipliers.

lang2007
URL
¿Web? A Radix-10 Digit-Recurrence Division Unit: Algorithm and Architecture, Tomás Lang and Alberto Nannarelli, IEEE Transactions on Computers, Vol. 56 #6, pp727–739, IEEE, June 2007.
Abstract: In this work, we present a radix-10 division unit that is based on the digit-recurrence algorithm. The previous decimal division designs do not include recent developments in the theory and practice of this type of algorithm, which were developed for radix-2k dividers. In addition to the adaptation of these features, the radix-10 quotient digit is decomposed into a radix-2 digit and a radix-5 digit in such a way that only five and two times the divisor are required in the recurrence. Moreover, the most significant slice of the recurrence, which includes the selection function, is implemented in radix-2, avoiding the additional delay introduced by the radix-10 carry-save additions and allowing the balancing of the paths to reduce the cycle delay. The results of the implementation of the proposed radix-10 division unit show that its latency is close to that of radix-16 division units (comparable dynamic range of significands) and it has a shorter latency than a radix-10 unit based on the Newton-Raphson approximation.

lynch1962
¿Web? On a Wired-In Binary-to-Decimal Conversion Scheme, W. C. Lynch, Communications of the ACM, Vol. 5 #3, pp159–159, ACM Press, March 1962.

mazor2002
¿Web? Fairchild decimal arithmetic unit, Stan Mazor, 9pp, pers. comm., July–September 2002.
Abstract: We embarked on the design of Symbol II [circa 1966], a large scale HIGH LEVEL language, virtual memory, time sharing machine. This machine used large printed circuit boards, approx. 16″ x 20″ with slots for over 210 DIP’s. We had 100 connector pins on each side and we defined the system using a number of parallel busses with multiple autonomous functional units and inter-processor communication. The completed system had over 110 printed circuit boards and consumed mega-watts of power...

moskal2007
¿Web? Design and Synthesis of a Carry-Free Signed-Digit Decimal Adder, John Moskal, Erdal Oruklu, and Jafar Saniie, IEEE International Symposium on Circuits and Systems (ISCAS 2007), pp1089–1092, IEEE, May 2007.
Abstract: The decimal arithmetic has been receiving an increased attention because of the growth of financial and scientific applications requiring high precision and increased computing power. This paper presents an efficient architecture for multi-digit decimal addition based on carry-free signed-digit numbers. In this study, the decimal adder architecture has been designed and synthesized using the TSMC 0.18mu technology. The synthesis results were compared to the existing decimal adders with respect to design area, delay and power consumption. These results show that proposed adder architecture improves the area-delay factor by 3 for a 32 digit adder.

neukom2005
¿Web? ERMETH: The First Swiss Computer, Hans Heukom, IEEE Annals of the History of Computing, pp5–22, IEEE, October 2005.
Abstract: Eduard Stiefel, in 1948 the first director of the Federal Institute of Technology�s newly established Institute of Applied Mathematics, recognized that computers would be essential to this new field of mathematics. Unable to find exactly what he wanted in existing computers, Stiefel developed the ERMETH. This article examines the rationale of, and objectives for, the first Swiss computer.

nikmehr2004
¿Web? A decimal carry-free adder, Hooman Nikmehr, Braden Phillips, and Cheng-Chew Lim, SPIE Symposium Smart Materials, Nano-, and Micro-Smart Systems, Proceedings of SPIE Vol. 5649, 12pp, SPIE International Society for Optical Engineering, December 2004.
Abstract: Recently, decimal arithmetic has become attractive in the financial and commercial world including banking, tax calculation, currency conversion, insurance and accounting. Although computers are still carrying out decimal calculation using software libraries and binary floating-point numbers, it is likely that in the near future, all processors will be equipped with units performing decimal operations directly on decimal operands. One critical building block for some complex decimal operations is the decimal carry-free adder. This paper discusses the mathematical framework of the addition, introduces a new signed-digit format for representing decimal numbers and presents an efficient architectural implementation. Delay estimation analysis shows that the adder offers improved performance over earlier designs.

nikmehr2006
¿Web? Fast Decimal Floating-Point Division, Hooman Nikmehr, Braden Phillips, and Cheng-Chew Lim, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 14 #9, ISSN 1063-8210, pp951–961, IEEE, September 2006.
Abstract: A new implementation for decimal floating-point (DFP) division is introduced. The algorithm is based on high-radix SRT division. The SRT division algorithm is named after D. Sweeney, J. E. Robertson, and T. D. Tocher, with the recurrence in a new decimal signed-digit format. Quotient digits are selected using comparison multiples, where the magnitude of the quotient digit is calculated by comparing the truncated partial remainder with limited precision multiples of the divisor. The sign is determined concurrently by investigating the polarity of the truncated partial remainder. A timing evaluation using a logic synthesis shows a significant decrease in the division execution time in contrast with one of the fastest DFP dividers reported in the open literature.

peuto1977
¿Web? An instruction timing model of CPU performance, Bernard L. Peuto and Leonard J. Shustek, Proceedings of the 4th annual symposium on Computer architecture, pp165–178, ACM Press, 1977.
Abstract: A model of high-performance computers is derived from instruction timing formulas, with compensation for pipeline and cache memory effects. The model is used to predict the performance of the IBM 370/168 and the Amdahl 470 V/6 on specific programs, and the results are verified by comparison with actual performance. Data collected about program behavior is combined with the performance analysis to highlight some of the problems with high-performance implementations of such architectures.

peuto1998
¿Web? An Instruction Timing Model of CPU Performance, Bernard L. Peuto and Leonard J. Shustek, International Conference on Computer Architecture: 25 years of the International Symposia on Computer architecture, pp152–165, ACM Press, 1998.
Abstract: A model of high-performance computers is derived from instruction timing formulas, with compensation for pipeline and cache memory effects. The model is used to predict the performance of the IBM 370/168 and the Amdahl 470 V/6 on specific programs, and the results are verified by comparison with actual performance. Data collected about program behavior is combined with the performance analysis to highlight some of the problems with high-performance implementations of such architectures.
Note: Original reference: ISCA 1977: pp165-178.

rich1955
¿Web? Arithmetic Operations in Digital Computers, R. K. Richards, ISBN (none), 397pp, D. Van Nostrand Co., NY, 1955.
Abstract: Among the first things that are learned in a study of mathematics are rules and procedures for performing basic arithmetic operations, notably addition, subtraction, multiplication, and division. The rules and procedures taught in school are, for the most part, aimed at making the operations as simple and speedy as possible when a pencil and a piece of paper are the only tools. In the design of more elaborate arithmetical tools, it is usually found necessary or at least highly desirable to devise new methods for executing the various arithmetic operations. ...
Note: Library of Congress No. 55-6234. Bibliography 9pp.

rosen1969
URL
¿Web? Electronic Computers: A Historical Survey, Saul Rosen, ACM Computing Surveys (CSUR), Vol. 1 #1, ISSN 0360-0300, pp7–36, ACM Press, March 1969.
Abstract: The first large scale electronic computers were built in connection with university projects sponsored by government military and research organizations. Many established companies, as well as new companies, entered the computer field during the first generation, 1947-1959, in which the vacuum tube was almost universally used as the active component in the implementation of computer logic. The second generation was characterized by the transistorized computers that began to appear in 1959. Some of the computers built then and since are considered super computers; they attempt to go to the limit of current technology in terms of size, speed, and logical complexity. From 1965 onward, most new computers belong to a third generation, which features integrated circuit technology and multiprocessor multiprogramming systems.

sacks1982
¿Web? Applications of Redundant Number Representations to Decimal Arithmetic, R. Sacks-Davis, The Computer Journal, Vol. 25 #4, pp471–477, November 1982.
Abstract: A decimal arithmetic unit is proposed for both integer and floating-point computations. To achieve comparable speed to a binary arithmetic unit, the decimal unit is based on a redundant number representation. With this representation no loss of compactness is made relative to binary coded decimal (BCD) form. In this paper the hardware required for the implementation of the basic operations of addition, subtraction, multiplication and division are described and the properties of floating-point arithmetic based on a redundant number representation are investigated.

sasao2005
¿Web? Radix Converters: Complexity and Implementation by LUT Cascades, Tsutomu Sasao, 35th International Symposium on Multiple-Valued Logic (ISMVL'05), pp256–263, IEEE, May 2005.
Abstract: In digital signal processing, we often use higher radix system to achieve high-speed computation. In such cases, we require radix converters. This paper considers the design of LUT cascades that convert ��-nary numbers to -nary numbers. In particular, we derive several upper bounds on the column multiplicities of decomposition charts that represent radix converters. From these, we can estimate the size of LUT cascades to realize radix converters. These results are useful to design compact radix converters, since these bounds show strategies to partition the outputs into groups.

schmid1968
¿Web? An electronic digital slide rule, Hermann Schmid and David Busch, The Electronic Engineer, pp54–64, July 1968.
Abstract: The Electronic Digital Slide Rule (EDSR) of the future not only will be smaller and easier to operate than the conventional slide rule, but it will also be more accurate.

schmid1974
¿Web? Decimal Computation, Hermann Schmid, ISBN 047176180X, 266pp, Wiley, 1974.
Abstract: This book is thus a collection, a catalog, and a review of BCD computation techniques. The book describes how each of the most common arithmetic and transcendental operations can be implemented in a variety of ways. ... covers ... A review of number systems, BCD codes, of early calculating instruments and electronic calculating machines ... An outline of BCD computing circuit applications in the automotive, consumer, education, and entertainment fields, illustrated with some specific examples ... Mathematical developments of the algorithms ... Discussions and comparisons of circuit complexity and performance (accuracy, resolution, and speed of operation) for the different algorithms ...
Note: Reprinted 1983, ISBN 0-89874-318-4, Robert E. Krieger Publishing Co.

schmoo1968
¿Web? High Speed Binary to Decimal Conversion, M. S. Schmookler, IEEE Transactions on Computers, Vol. C-17, pp506–508, IEEE, 1968.
Abstract: This note describes several methods of performing fast, efficient, binary-to-decimal conversion. With a modest amount of circuitry, an order of magnitude speed improvement can is obtained. This achievement offers a unique advantage to general-purpose computers requiring special hardware to translate between binary and decimal numbering systems.

schmoo1971
¿Web? High speed decimal addition, Martin S. Schmookler and Arnold Weinberger, IEEE Transactions on Computers, Vol. C-20 #8, pp862–867, IEEE, August 1971.
Abstract: Parallel decimal arithmetic capability is becoming increasingly attractive with new applications of computers in a multiprogramming environment. The direct production of decimal sums offers a significant improvement in addition over methods requiring decimal correction. These techniques are illustrated in the eight-digit adder which appears in the System/360 Model 195.

schulte2005
URL
¿Web? Performance Evaluation of Decimal Floating-Point Arithmetic, Michael J. Schulte, Nick Lindberg, and Anitha Laxminarain, Proceedings of the 6th IBM Austin Center for Advanced Studies Conference, Austin, TX,, 8pp, IBM, February 2005.
Abstract: The prominence of decimal data in commercial and financial applications has led researchers to pursue efficient techniques for performing decimal floating-point arithmetic. While several software implementations of decimal floating-point arithmetic have been implemented, there is a growing need to provide hardware support for decimal floating-point arithmetic to keep up with the processing demands of emerging commercial and financial applications. This paper evaluates and compares the performance of decimal floating-point arithmetic operations when implemented on superscalar processors using either software libraries or specialized hardware designs. Our comparisons show that hardware implementations of decimal floating-point arithmetic operations are one to two orders of magnitude faster than software implementations.

schwarz2002
¿Web? The microarchitecture of the IBM eServer z900 processor, Eric M. Schwarz et al, IBM Journal of Research and Development, Vol. 46 #4/5, pp381–395, IBM, July/September 2002.
Abstract: The recent IBM ESA/390 CMOS line of processors, from 1997 to 1999, consisted of the G4, G5, and G6 processors. The architecture they implemented lacked 64-bit addressability and had only a limited set of 64-bit arithmetic instructions. The processors also lacked data and instruction bandwidth, since they utilized a unified cache. The branch performance was good, but there were delays due to conflicts in searching and writing the branch target buffer. Also, the hardware data compression and decimal arithmetic performance, though good, was in demand by database and COBOL programmers. Most of the performance concerns regarding prior processors were due to area constraints. Recent technology advances have increased the circuit density by 50 percent over that of the G6 processor. This has allowed the design of several performance-critical areas to be revisited. The end result of these efforts is the IBM eServer z900 processor, which is the first high-end processor based on the new 64-bit z/Architecture^TM.

schwarz2009
URL
¿Web? Decimal floating-point support on the IBM System z10 processor, Eric M. Schwarz, John S. Kapernick, and Mike F. Cowlishaw, IBM Journal of Research and Development, Vol. 53 #1, pp4:1–4:10, IBM, January 2009.
Abstract: The latest IBM zSeries processor, the IBM System z10 processor, provides hardware support for the decimal floating-point (DFP) facility that was introduced on the IBM System z9 processor. The z9 processor implements the facility with a mixture of low-level software and hardware assists. Recently, the IBM POWER6 processor-based System p 570 server introduced a hardware implementation of the DFP facility. The latest zSeries processor includes a decimal floating-point unit based on the POWER6 processor DFP unit that has been enhanced to also support the traditional zSeries decimal fixed-point instruction set. This paper explains the hardware implementation to support both decimal fixed point and DFP and the new software support for the DFP facility, including IBM z/OS, Java JIT, and C/C++ compilers, as well as support in IBM DB2 and middleware.

shirazi1988
¿Web? VLSI designs for redundant binary-coded decimal addition, Behrooz Shirazi, David Y. Y. Yun, and Chang N. Zhang, IEEE Seventh Annual International Phoenix Conference on Computers and Communications, 1988, pp52–56, IEEE, March 1988.
Abstract: Binary-coded decimal (BCD) system provides rapid binary-decimal conversion. However, BCD arithmetic operations are often slow and require complex hardware. One can eliminate the need for carry propagation and thus improve performance of BCD operations by using a redundant binary-coded decimal (RBCD) system. This paper introduces the VLSI design of an RBCD adder. The design consists of two small PLA’s and two four-bit binary adders for one digit of the RBCD adder. The addition delay is constant for n-digit RBCD addition (no carry propagation delay). The VLSI time and space complexities of the design as well as its layout are presented, showing the regularity of the structures. In addition, two simple algorithms and the corresponding hardware designs for conversion between RBCD and BCD are presented.

sites1974
¿Web? Serial Binary Division by Ten, R. L. Sites, IEEE Transactions on Computers, Vol. 23 #12, ISSN 0018-9340, pp1299–1301, IEEE, December 1974.
Abstract: A technique is presented for dividing a positive binary integer by ten, in which the bits of the input are presented serially, low-order bit first. A complete division by ten is performed in two word times (comparable to the time needed for two serial additions). The technique can be useful in serial conversions from binary to decimal, or in scaling binary numbers by powers of 10.

svoboda1969
¿Web? Decimal Adder with Signed Digit Arithmetic, Antonin Svoboda, IEEE Transactions on Computers, Vol. 18 #3, pp212–215, IEEE, March 1969.
Abstract: The decimal adder with signed digit arithmetic presented here was designed to establish the following facts: the redundant representation of a decimal digit x_i by a 5-bit binary number X_i=3x_i leads to a logical design of extreme simplicity; it is possible to form an additional algorithm for the adder so that it can be used to transform numbers written in a conventional decinal form into a signed digit form, and vice versa.

thapliyal2006
URL
¿Web? Novel BCD Adders and Their Reversible Logic Implementation for IEEE 754r Format, Himanshu Thapliyal, Saurabh Kotiyal, and M. B. Srinivas, Proceeding of the 19th International Conference on VLSI Design (VLSID’06), pp387–392, IEEE, 2006.
Abstract: IEEE 754r is the ongoing revision to the IEEE 754 floating point standard and a major enhancement to the standard is the addition of decimal format. This paper proposes two novel BCD adders called carry skip and carry look-ahead BCD adders respectively. Furthermore, in the recent years, reversible logic has emerged as a promising technology having its applications in low power CMOS, quantum computing, nanotechnology, and optical computing. It is not possible to realize quantum computing without reversible logic. Thus, this paper also provides the reversible logic implementation of the conventional BCD adder as the well as the proposed Carry Skip BCD adder using a recently proposed TSG gate. Furthermore, a new reversible gate called TS-3 is also being proposed and it has been shown that the proposed reversible logic implementation of the BCD Adders is much better compared to recently proposed one, in terms of number of reversible gates used and garbage outputs produced. The reversible BCD circuits designed and proposed here form the basis of the decimal ALU of a primitive quantum CPU.

thapliyal2006b
¿Web? Modified Carry Look Ahead BCD Adder With CMOS and Reversible Logic Implementation, Himanshu Thapliyal and Hamid R. Arabnia, Proceedings of the 2006 International Conference on Computer Design (CDES'06), ISBN 1-60132-009-4, pp64–69, CSREA Press, November 2006.
Abstract: IEEE 754r is the ongoing revision to the IEEE 754 floating point standard and a major enhancement to the standard is the addition of decimal format. Firstly, this paper proposes novel two transistor AND & OR gates. The proposed AND gate has no power supply, thus it can be referred as the Powerless AND gate. Similarly, the proposed two transistor OR gate has no ground and can be referred as Groundless OR. Two designs of AND & OR gate without VDD or GND are also shown. Secondly for IEEE 754r format, one novel BCD adder called carry look-ahead BCD adder is also proposed. In order to design the carry look-ahead BCD adder, a novel 4 bit carry look-ahead adder called NCLA is proposed which forms the basic building block of the proposed carry look-ahead BCD adder. The proposed two transistors AND & OR gates are used to provide the optimized small area, low power, high throughput circuitries of the proposed BCD adder. Nowadays, reversible logic is also emerging as a promising computing paradigm having its applications in quantum computing, optical computing and nanotechnology. Thus, reversible logic implementation of the proposed BCD Adder is also shown in this paper.

thapliyal2006c
¿Web? Design of Novel Reversible Carry Look-Ahead BCD Subtractor, Himanshu Thapliyal and Sumedha K. Gupta, Proceedings of the 9th International Conference on Information Technology (ICIT'06), ISBN 0-7695-2635-7, pp253–258, IEEE, December 2006.
Abstract: IEEE 754r is the ongoing revision to the IEEE 754 floating point standard. A major enhancement to the standard is the addition of decimal format, thus the design of BCD arithmetic units is likely to get significant attention. Firstly, this paper introduces a novel carry look-ahead BCD adder and then builds a novel carry look-ahead BCD subtractor based on it. Secondly, it introduces the reversible logic implementation of the proposed carry look-ahead BCD subtractor. We have tried to design the reversible logic implementation of the BCD Subtractor optimal in terms of number of reversible gates used and garbage outputs produced. Thus, the proposed work will be of significant value as the technologies mature.

thomp2004
¿Web? A 64-bit Decimal Floating-Point Adder (extended version), John Thompson, Nandini Karra, and Michael J Schulte, Proceedings of the IEEE Computer Society Annual Symposium on VLSI, Lafayette, LA, February, 2004., pp297–298, IEEE, February 2004.
Abstract: Due to the rapid growth in financial, commercial, and Internet-based applications, there is an increasing desire to allow computers to operate on both binary and decimal floating-point numbers. Consequently, specifications for decimal floating-point arithmetic are being added to the IEEE-754 Standard for Floating-Point Arithmetic. In this paper, we present the design and implementation of a decimal floating-point adder that is compliant with the current draft revision of the IEEE-754 Standard. The adder supports operations on 64-bit (16-digit) decimal floating-point operands. We provide synthesis results indicating the estimated area and delay for our design when it is pipelined to various depths.

thomsen2008
¿Web? Optimized reversible binary-coded decimal adders, Michael Kirkedal Thomsen and Robert Gl�ck, Journal of Systems Architecture: the EUROMICRO Journal, Vol. 54 #7, ISSN 1383-7621, pp697–706, Elsevier, July 2008.
Abstract: Babu and Chowdhury recently proposed, in this journal, a reversible adder for binary-coded decimals. This paper corrects and optimizes their design. The optimized 1-decimal BCD full-adder, a 13x13 reversible logic circuit, is faster, and has lower circuit cost and less garbage bits. It can be used to build a fast reversible m-decimal BCD full-adder that has a delay of only m+17 low-power reversible CMOS gates. For a 32-decimal (128-bit) BCD addition, the circuit delay of 49 gates is significantly lower than is the number of bits used for the BCD representation. A complete set of reversible half- and full-adders for n-bit binary numbers and m-decimal BCD numbers is presented. The results show that special-purpose design pays off in reversible logic design by drastically reducing the number of garbage bits. Specialized designs benefit from support by reversible logic synthesis. All circuit components required for optimizing the original design could also be synthesized successfully by an implementation of an existing synthesis algorithm.

tsen2007a
¿Web? Hardware Design of a Binary Integer Decimal-based IEEE P754 Rounding Unit, Charles Tsen, Michael J. Schulte, and Sonia Gonzalez-Navarro, Proceedings of the IEEE 18th International International Conference on Application-specific Systems, Architectures and Processors (ASAP), 7pp, IEEE, July 2007.
Abstract: Because of the growing importance of decimal floating-point (DFP) arithmetic, specifications for it were recently added to the draft revision of the IEEE 754 Standard (IEEE P754). In this paper, we present a hardware design for a rounding unit for 64-bit DFP numbers (decimal64) that use the IEEE P754 binary encoding of DFP numbers, which is widely known as the Binary Integer Decimal (BID) encoding. We summarize the technique used for rounding, present the theory and design of the BID rounding unit, and evaluate its critical path delay, latency, and area for combinational and pipelined designs. Over 86% of the rounding unit’s area is due to a 55-bit by 54-bit binary multiplier, which can be shared with a double-precision binary floating-point multiplier. To our knowledge, this is the first hardware design for rounding IEEE P754 BID-encoded DFP numbers.

tsen2007b
¿Web? Hardware Design of a Binary Integer Decimal-based Floating-point Adder, Charles Tsen, Sonia Gonzalez-Navarro, and Michael J. Schulte, Proceedings of the IEEE 25th International Conference on Computer Design, 9pp, IEEE, October 2007.
Abstract: Because of the growing importance of decimal floating-point (DFP) arithmetic, specifications for it are included in the IEEE Draft Standard for Floating-point Arithmetic (IEEE P754). In this paper, we present a novel algorithm and hardware design for a DFP adder. The adder performs addition and subtraction on 64-bit operands that use the IEEE P754 binary encoding of DFP numbers, widely known as the Binary Integer Decimal (BID) encoding. The BID adder uses a novel hardware component for decimal digit counting and an enhanced version of a previously published BID rounding unit. By adding more sophisticated control, operations are performed with variable latency to optimize for common cases. We show that a BID-based DFP adder design can be achieved with a modest area increase compared to a single 2-stage pipelined 64-bit fixed-point multiplier. Over 70% of the BID adder�s area is due the 64-bit fixed-point multiplier, which can be shared with a binary floating-point multiplier and hardware for other DFP operations. To our knowledge, this is the first hardware design for adding and subtracting IEEE P754 BID-encoded DFP numbers.

tumlin1993
¿Web? An evaluation of the design of the Gamma 60, T. J. Tumlin and M. Smothermann, Actes du 3e colloque de l'Histoire de l'Informatique, 11pp, Sophia-Antipolis, INRIA, 1993.
Abstract: The Bull Gamma 60 remains a major innovation in computer design. Its use of explicit FORK-JOIN parallelism is shown by a simulation model to wisely exploit a large difference in speeds between logic components and memory elements, as found on some machines of the 1950’s. Recently the reappearance of a large speed ratio makes the same type of explicit FORK-JOIN parallelism attractive in advanced designs and validates the latency-tolerant design philosophyof the Gamma 60. The major difficulty of the design is the programming effort required to fully express the parallelism available in programs.

vazquez2007
URL
¿Web? A New Family of High�Performance Parallel Decimal Multipliers, Alvaro V�zquez, Elisardo Antelo, and Paolo Montuschi, Proceedings of the 18th IEEE Symposium on Computer Arithmetic, ISBN 0-7695-2854-6, ISBN 978-0-7695-2854-0, pp195–204, IEEE, June 2007.
Abstract: This paper introduces two novel architectures for parallel decimal multipliers. Our multipliers are based on a new algorithm for decimal carry�save multioperand addition that uses a novel BCD�4221 recoding for decimal digits. It significantly improves the area and latency of the partial product reduction tree with respect to previous proposals. We also present three schemes for fast and efficient generation of partial products in parallel. The recoding of the BCD�8421 multiplier operand into minimally redundant signed�digit radix�10, radix�4 and radix�5 representations using new recoders reduces the complexity of partial product generation. In addition, SD radix�4 and radix�5 recodings allow the reuse of a conventional parallel binary radix�4 multiplier to perform combined binary/ decimal multiplications. Evaluation results show that the proposed architectures have interesting area�delay figures compared to conventional Booth radix�4 and radix�8 parallel binary multipliers and other representative alternatives for decimal multiplication.

veerama2007
¿Web? Novel, High-Speed 16-Digit BCD Adders Conforming to IEEE 754r Format, Sreehari Veeramachaneni, M.Kirthi Krishna, Lingamneni Avinash, Sreekanth Reddy P, and M.B. Srinivas, IEEE Computer Society Annual Symposium on VLSI (ISVLSI '07), pp343–350, IEEE, May 2007.
Abstract: In view of increasing prominence of commercial, financial and internet-based applications that process data in decimal format, there is a renewed interest in providing hardware support to handle decimal data. In this paper, a new architecture for efficient 1-digit decimal addition of binary coded decimal (BCD) operands, which is the core of high speed multi-operand adders and floating decimal-point arithmetic, is proposed. Based on this 1-digit BCD adder, novel architectures for higher order (n-digit) BCD adders such as ripple carry adder and carry look-ahead adder are derived. The proposed circuits are compared (both qualitatively as well as quantitatively) with the existing circuits in literature and are shown to perform better. Simulation results show that the proposed 1-digit BCD adder achieves an improvement of 40% in delay. The 16-digit BCD lookahead adder using prefix logic is shown to perform at least 80% faster than the existing ripple carry one.

veerama2008
¿Web? A Novel Carry-Look Ahead Approach to a Unified BCD and Binary Adder/Subtractor, Sreehari Veeramachaneni, M. Kirthi Krishna, G. V. Prateek, S. Subroto, S. Bharat, and M. B. Srinivas, Proceedings of the 21st International Conference on VLSI Design (VLSID '08), ISBN 0-7695-3083-4, pp547–552, IEEE Computer Society, January 2008.
Abstract: Increasing prominence of commercial, financial and internet-based applications, which process decimal data, there is an increasing interest in providing hardware support for such data. In this paper, new architecture for efficient binary and Binary Coded Decimal (BCD) adder/subtractor is presented. This employs a new method of subtraction unlike the existing designs which mostly use 10’s complements, to obtain a much lower latency. Though there is a necessity of correction in some cases, the delay overhead is minimal. A complete discussion about such cases and the required logic to process is presented. The architecture is run-time reconfigurable to facilitate both BCD and binary operations, including signed and unsigned numbers. The proposed circuits are compared (both qualitatively as well as quantitatively) with the existing circuits in literature and are shown to perform better. Simulation results show that the proposed architecture is at least 11% faster than the existing designs.

wang2004
¿Web? Decimal Floating-Point Division Using Newton-Raphson Iteration, Liang-Kai Wang and Michael J Schulte, Proceedings of the 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP’04), pp84–95, IEEE Computer Society Press, September 2004.
Abstract: Decreasing feature sizes allow additional functionality to be added to future microprocessors to improve the performance of important application domains. As a result of rapid growth in financial, commercial, and Internet-based applications, hardware support for decimal floating-point arithmetic is now being considered by various computer manufacturers and specifications for decimal floating-point arithmetic have been added to the draft revision of the IEEE-754 Standard for Floating-Point Arithmetic (IEEE-754R). This paper presents an efficient arithmetic algorithm and hardware design for decimal floating-point division. The design uses an optimized piecewise linear approximation, a modified Newton- Raphson iteration, a specialized rounding technique, and a simplified combined decimal incrementer/decrementer. Synthesis results show that a 64-bit (16-digit) implementation of the decimal divider, which is compliant with IEEE-754R, has an estimated critical path delay of 0.69 ns when implemented using LSI Logic’s 0.11 micron gflx-p standard cell library.

wang2005
¿Web? Decimal Floating-Point Square Root Using Newton-Raphson Iteration, Liang-Kai Wang and Michael J Schulte, Proceedings of the 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP’05), pp309–315, IEEE Computer Society Press, July 2005.
Abstract: With continued reductions in feature size, additional functionality may be added to future microprocessors to boost the performance of important application domains. Due to growth in commercial, financial, and Internet-based applications, decimal floating point arithmetic is now attracting more attention, and hardware support for decimal operations is being considered by various computer manufacturers. In order to standardize decimal number formats and operations, specifications for decimal floating-point arithmetic have been added to the draft revision of the IEEE-754 Standard for Floating-Point Arithmetic (IEEE-754R). This paper presents an efficient arithmetic algorithm and hardware design for decimal floating-point square root. This design uses an optimized piecewise linear approximation, a modified Newton-Raphson iteration, a specialized rounding technique, and a modified decimal multiplier. Synthesis results show that a 64-bit (16-digit) implementation of the decimal square root, which is compliant with the IEEE-754R, has an estimated critical path delay of 0.95 ns and maximum latency of 210 clock cycles when implemented using LSI Logic’s 0.11 micron Gflx-P Standard Cell library.

wang2007
URL
¿Web? Decimal Floating-Point Adder and Multifunction Unit with Injection-Based Rounding, Liang-Kai Wang and Michael J. Schulte, Proceedings of the 18th IEEE Symposium on Computer Arithmetic, ISBN 0-7695-2854-6, ISBN 978-0-7695-2854-0, pp56–65, IEEE, June 2007.
Abstract: Shrinking feature sizes gives more headroom for designers to extend the functionality of microprocessors. The IEEE 754R working group has revised the IEEE 754-1985 Standard for Binary Floating-Point Arithmetic to include specifications for decimal floating-point arithmetic and IBM recently announced incorporating a decimal floatingpoint unit into their POWER6 processor. As processor support for decimal floating-point arithmetic emerges, it is important to investigate efficient algorithms and hardware designs for common decimal floating-point arithmetic algorithms. This paper presents novel designs for a decimal floating-point adder and a decimal floating-point multifunction unit. To reduce their delay, both the adder and the multifunction unit use decimal injection-based rounding, a new form of decimal operand alignment, and a fast flag-based method for rounding and overflow detection. Synthesis results indicate that the proposed adder is roughly 21% faster and 1.6% smaller than a previous decimal floating-point adder design, when implemented in the same technology. Compared to the decimal floating-point adder, the decimal floating-point multifunction unit provides six additional operations, yet only has 2.8%more delay and 9.7% more area.

wang2007c
¿Web? A Decimal Floating-Point Divider using Newton-Raphson Iteration, Liang-Kai Wang and Michael J. Schulte, Journal of VLSI Signal Processing Systems, Vol. 49 #1, ISSN 0922-5773, pp3–18, Kluwer Academic Publishers, October 2007.
Abstract: Increasing chip densities and transistor counts provide more room for designers to add functionality for important application domains into future microprocessors. As a result of rapid growth in financial, commercial, and Internet-based applications, hardware support for decimal floating-point arithmetic is now being considered by various computer manufacturers and specifications for decimal floating-point arithmetic have been added to the draft revision of the IEEE-754 Standard for Floating-Point Arithmetic (IEEE P754). In this paper, we present an efficient arithmetic algorithm and hardware design for decimal floating-point division. The design uses an efficient piecewise linear approximation, a modified Newton-Raphson iteration, a specialized rounding technique, and a simplified decimal incrementer and decrementer. Synthesis results show that a 64-bit (16-digit) implementation of the decimal divider, which is compliant with the current version of IEEE P754, has an estimated critical path delay of 0.69 ns (around 13 FO4 inverter delays) when implemented using LSI Logic’s 0.11 micron Gflx-P standard cell library.

wang2007d
¿Web? Processor support for decimal floating-point arithmetic, Liang-Kai Wang, ISBN 978-0-549-19463-7, 157pp, University of Wisconsin at Madison, 2007.
Abstract: Decimal data permeates society, as humans most commonly use base-ten numbers. Although microprocessors normally use base-two binary arithmetic to obtain faster execution times and simpler circuitry, binary numbers cannot represent decimal fractions exactly. This leads to large errors being accumulated after several decimal operations. Furthermore, binary floating-point arithmetic operations perform binary rounding instead of decimal rounding. Consequently, applications, such as financial, commercial, tax, and Internet-based applications, which are sensitive to representation and rounding errors, often require decimal arithmetic. Due to the increasing importance of and demand for decimal arithmetic, its formats and operations have been specified in the IEEE Draft Standard for Floating-point Arithmetic (IEEE P754).
   Most decimal applications use software routines and binary arithmetic to emulate decimal operations. Although this approach eliminates errors due to converting between binary and decimal numbers and provides decimal rounding to mirror manual calculations, it results in long latencies for numerically intensive commercial applications. This is because software emulation of decimal floating-point (DFP) arithmetic has significant overhead due to function calls, dealing with decimal formats, operand alignment, decimal rounding, and special case and exception handling.
   This dissertation investigates processor support for decimal floating-point arithmetic. It first reviews recent progress in decimal arithmetic, including decimal encodings, the IEEE P754 Draft Standard, and software packages, hardware designs, and benchmark suites for decimal arithmetic. Next, this dissertation presents novel arithmetic algorithms and hardware designs for basic DFP operations, including DFP addition, subtraction, division, square root, and others. Most of the hardware designs presented in this dissertation are the first published designs compliant with the IEEE P754 Draft Standard. Finally, to study the performance impact of DFP instructions and hardware, this dissertation presents the first publicly available benchmark suite for DFP arithmetic. This benchmark suite, along with instruction set extensions and a decimal-enhanced processor simulator, are used to demonstrate that providing fast hardware support for DFP operations leads to significant performance benefits to DFP-intensive applications.

webb2008
URL
¿Web? IBM z10: The Next-Generation Mainframe Microprocessor, Charles Webb, IEEE Micro Vol. 28 #2, ISSN 0272-1732, pp19–29, IEEE, March/April 2008.
Abstract: The IBM system z10 includes four microprocessor cores — each with a private 3-Mbyte cache — and integrated accelerators for decimal floating-point computation, cryptography, and data compression. A separate SMP hub chip provides a shared third-level cache and interconnect fabric for multiprocessor scaling. This article focuses on the high-frequency design techniques used to achieve a 4.4-GHz system, and on the pipeline design that optimizes z10’s CPU performance.

weik1961
URL
¿Web? A Third Survey of Domestic Electronic Digital Computing Systems, Report No. 1115, Martin H. Weik, 1131pp, Ballistic Research Laboratories, Aberdeen Proving Ground, Maryland, March 1961.
Abstract: Based on the results of a third survey, the engineering and programming characteristics of two hundred twenty-two different electronic digital computing systems are given. The data are presented from the point of view of application, numerical and arithmetic characteristics, input, output and storage systems, construction and checking features, power, space, weight, and site preparation and personnel requirements, production records, cost and rental rates, sale and lease policy, reliability, operating experience, and time availability, engineering modifications and improvements and other related topics. An analysis of the survey data, fifteen comparative tables, a discussion of trends, a revised bibliography, and a complete glossary of computer engineering and programming terminology are included.

yuen1977
¿Web? A New Representation for Decimal Numbers, C. K. Yuen, IEEE Transactions on Computers, Vol. 26 #12, pp1286–1288, IEEE, December 1977.
Abstract: A new representation for decimal numbers is proposed. It uses a mixture of positive and negative radixes to ensure that the maximum value of a four bit decimal digit is 9. This eliminates the more complex carry generation process required in BCD addition.

The 93 references listed on this page are selected from the bibliography on Decimal Arithmetic collected by Mike Cowlishaw. Please see the index page for more details and other categories.