The Experts below are selected from a list of 542301 Experts worldwide ranked by ideXlab platform
Fabrizio Lombardi - One of the best experts on this subject based on the ideXlab platform.
-
approximate radix 8 booth Multipliers for low power and high performance operation
IEEE Transactions on Computers, 2016Co-Authors: Honglan Jiang, Fei Qiao, Fabrizio LombardiAbstract:The Booth Multiplier has been widely used for high performance signed multiplication by encoding and thereby reducing the number of partial products. A Multiplier using the radix- $4$ (or modified Booth) algorithm is very efficient due to the ease of partial product generation, whereas the radix- $8$ Booth Multiplier is slow due to the complexity of generating the odd multiples of the multiplicand. In this paper, this issue is alleviated by the application of approximate designs. An approximate $2$ -bit adder is deliberately designed for calculating the sum of $1\times$ and $2\times$ of a binary number. This adder requires a small area, a low power and a short critical path delay. Subsequently, the $2$ -bit adder is employed to implement the less significant section of a recoding adder for generating the triple multiplicand with no carry propagation. In the pursuit of a trade-off between accuracy and power consumption, two signed $16\times 16$ bit approximate radix-8 Booth Multipliers are designed using the approximate recoding adder with and without the truncation of a number of less significant bits in the partial products. The proposed approximate Multipliers are faster and more power efficient than the accurate Booth Multiplier. The Multiplier with 15-bit truncation achieves the best overall performance in terms of hardware and accuracy when compared to other approximate Booth Multiplier designs. Finally, the approximate Multipliers are applied to the design of a low-pass FIR filter and they show better performance than other approximate Booth Multipliers.
-
a low power high performance approximate Multiplier with configurable partial error recovery
Design Automation and Test in Europe, 2014Co-Authors: Cong Liu, Jie Han, Fabrizio LombardiAbstract:Approximate circuits have been considered for error-tolerant applications that can tolerate some loss of accuracy with improved performance and energy efficiency. Multipliers are key arithmetic circuits in many such applications such as digital signal processing (DSP). In this paper, a novel approximate Multiplier with a lower power consumption and a shorter critical path than traditional Multipliers is proposed for high-performance DSP applications. This Multiplier leverages a newly-designed approximate adder that limits its carry propagation to the nearest neighbors for fast partial product accumulation. Different levels of accuracy can be achieved through a configurable error recovery by using different numbers of most significant bits (MSBs) for error reduction. The approximate Multiplier has a low mean error distance, i.e., most of the errors are not significant in magnitude. Compared to the Wallace Multiplier, a 16-bit approximate Multiplier implemented in a 28nm CMOS process shows a reduction in delay and power of 20% and up to 69%, respectively. It is shown that by utilizing an appropriate error recovery, the proposed approximate Multiplier achieves similar processing accuracy as traditional exact Multipliers but with significant improvements in power and performance.
He Chen - One of the best experts on this subject based on the ideXlab platform.
-
an energy efficient Multiplier with fully overlapped partial products reduction and final addition
IEEE Transactions on Circuits and Systems I-regular Papers, 2016Co-Authors: M D Ercegovac, He ChenAbstract:An energy-efficient fast array Multiplier is proposed and designed. The Multiplier operates in a left-to-right mode enabling a full overlap between reduction of partial products in carry-save form and the final addition producing the product. The design is based on the left-to-right carry-free (LRCF) Multiplier. It differs from the LRCF Multiplier in a much smaller on-the-fly conversion circuit of $O(n)$ size and the use of radix-4 full adders in the conversion. The new converter produces the most-significant half of the product during the reduction process. It eliminates the most-significant part of the final adder. The least-significant half of the product is obtained with a carry-ripple adder during the reduction. Thus conversion of the carry-save form of accumulated partial products to the conventional product does not add any delay to the total time of the Multiplier. Several right-to-left, left-to-right Multipliers and tree Multipliers are designed for 16, 24, 32, and 56 bits, and radices 2 and 4, synthesized in 90 nm technology and compared, demonstrating the advantages and disadvantages of the proposed design with respect to area, delay, power, and energy. We considered both truncated and full-precision Multipliers. The proposed Multiplier has lower delay, area, power, and energy than other considered types of array Multipliers. Its advantages grow with the increase in precision. As expected, it is slower than a tree Multiplier but it has smaller area, power, and energy.
Youngho Park - One of the best experts on this subject based on the ideXlab platform.
-
new architecture for multiplication in gf 2 m and comparisons with normal and polynomial basis Multipliers for elliptic curve cryptography
International Conference on Information Security and Cryptology, 2005Co-Authors: Soonhak Kwon, Taekyoung Kwon, Youngho ParkAbstract:We propose a new linear Multiplier which is comparable to linear polynomial basis Multipliers in terms of the area and time complexity. Also we give a very detailed comparison of our Multiplier with the normal and polynomial basis Multipliers for the five binary fields GF(2m), m=163,233,283,409,571, recommended by NIST for elliptic curve digital signature algorithm.
Bogdan Pasca - One of the best experts on this subject based on the ideXlab platform.
-
FPL - Extracting INT8 Multipliers from INT18 Multipliers
2019 29th International Conference on Field Programmable Logic and Applications (FPL), 2019Co-Authors: Martin Langhammer, Bogdan Pasca, Gregg Baeckler, Sergey GribokAbstract:With the advent of machine learning as perhaps the most high-profile application area for FPGAs, there is a compelling reason to improve the provision of smaller precision arithmetic on these devices. INT8 is commonly used for AI inferencing, and along with some additional soft logic for exponent handling, can be an effective solution for training as well. This paper describes techniques for efficiently extracting INT8 Multipliers from commonly available INT18 Multipliers found in many modern FPGAs. A small amount of soft logic - as little as 7 ALMs per INT8 Multiplier - is required to provide pre or post Multiplier correction to calculate two INT8 multiplies from a single 18x18 Multiplier. We present two configurations for both signed and unsigned representations where two multiplications share one input operand. In addition to the individual INT8 variants, we present full device cases of 22,400 INT8 Multipliers organized as DOT32 product arrays, with the soft logic tightly bound to the INT18 based DSP Blocks. A majority of the soft logic and routing in the device is left untouched, and available for application development.
-
Extracting INT8 Multipliers from INT18 Multipliers
2019Co-Authors: Martin Langhammer, Bogdan Pasca, Gregg Baeckler, Sergey GribokAbstract:With the advent of machine learning as perhaps the most high-profile application area for FPGAs, there is a compelling reason to improve the provision of smaller precision arithmetic on these devices. INT8 is commonly used for AI inferencing, and along with some additional soft logic for exponent handling, can be an effective solution for training as well. This paper describes techniques for efficiently extracting INT8 Multipliers from commonly available INT18 Multipliers found in many modern FPGAs. A small amount of soft logic-as little as 7 ALMs per INT8 Multiplier-is required to provide pre or post Multiplier correction to calculate two INT8 multiplies from a single 18x18 Multiplier. We present two configurations for both signed and unsigned representations where two multiplications share one input operand. In addition to the individual INT8 variants, we present full device cases of 22,400 INT8 Multipliers organized as DOT32 product arrays, with the soft logic tightly bound to the INT18 based DSP Blocks. A majority of the soft logic and routing in the device is left untouched, and available for application development.
-
Multipliers for floating point double precision and beyond on fpgas
ACM Sigarch Computer Architecture News, 2010Co-Authors: Sebastian Banescu, Florent De Dinechin, Bogdan Pasca, Radu TudoranAbstract:The implementation of high-precision floating-point applications on reconfigurable hardware requires large Multipliers. Full Multipliers are the core of floating-point Multipliers. Truncated Multipliers, trading resources for a well-controlled accuracy degradation, are useful building blocks in situations where a full Multiplier is not needed. This work studies the automated generation of such Multipliers using the embedded Multipliers and adders present in the DSP blocks of current FPGAs. The optimization of such Multipliers is expressed as a tiling problem, where a tile represents a hardware Multiplier, and super-tiles represent combinations of several hardware Multipliers and adders, making efficient use of the DSP internal resources. This tiling technique is shown to adapt to full or truncated Multipliers. It addresses arbitrary precisions including single, double but also the quadruple precision introduced by the IEEE-754-2008 standard and currently unsupported by processor hardware. An open-source implementation is provided in the FloPoCo project.
-
Multipliers for Floating-Point Double Precision and Beyond on FPGAs
2010Co-Authors: Sebastian Banescu, Florent De Dinechin, Bogdan Pasca, Radu TudoranAbstract:The implementation of high-precision floating-point applications on reconfigurable hardware requires a variety of large Multipliers: Standard Multipliers are the core of floating-point Multipliers; Truncated Multipliers, trading resources for a well-controlled accuracy degradation, are useful building blocks in situations where a full Multiplier is not needed. This work studies the automated generation of such Multipliers using the embedded Multipliers and adders present in DSP blocks of current FPGAs. The optimization of such Multipliers is expressed as a tiling problem where a tile represents a hardware Multiplier and super-tiles are the wiring of several hardware Multipliers making efficient use of the DSP internal resources. This tiling technique is shown to adapt to full or truncated Multipliers. It addresses arbitrary precisions including single, double but also in the quadruple precision introduced by the IEEE-754-2008 standard and currently unsupported by processor hardware. An open-source implementation is provided in the FloPoCo project.
-
Large Multipliers with less DSP blocks
2009Co-Authors: Florent De Dinechin, Bogdan PascaAbstract:Recent computing-oriented FPGAs feature DSP blocks including small embedded Multipliers. A large integer Multiplier, for instance for a double-precision floating-point Multiplier, consumes many of these DSP blocks. This article studies three non-standard implementation techniques of large Multipliers: the Karatsuba-Ofman algorithm, non-standard Multiplier tiling, and specialized squarers. They allow for large Multipliers working at the peak frequency of the DSP blocks while reducing the DSP block usage. Their overhead in term of logic resources, if any, is much lower than that of emulating embedded Multipliers. Their latency overhead, if any, is very small. Complete algorithmic descriptions are provided, carefully mapped on recent Xilinx and Altera devices, and validated by synthesis results.
M D Ercegovac - One of the best experts on this subject based on the ideXlab platform.
-
an energy efficient Multiplier with fully overlapped partial products reduction and final addition
IEEE Transactions on Circuits and Systems I-regular Papers, 2016Co-Authors: M D Ercegovac, He ChenAbstract:An energy-efficient fast array Multiplier is proposed and designed. The Multiplier operates in a left-to-right mode enabling a full overlap between reduction of partial products in carry-save form and the final addition producing the product. The design is based on the left-to-right carry-free (LRCF) Multiplier. It differs from the LRCF Multiplier in a much smaller on-the-fly conversion circuit of $O(n)$ size and the use of radix-4 full adders in the conversion. The new converter produces the most-significant half of the product during the reduction process. It eliminates the most-significant part of the final adder. The least-significant half of the product is obtained with a carry-ripple adder during the reduction. Thus conversion of the carry-save form of accumulated partial products to the conventional product does not add any delay to the total time of the Multiplier. Several right-to-left, left-to-right Multipliers and tree Multipliers are designed for 16, 24, 32, and 56 bits, and radices 2 and 4, synthesized in 90 nm technology and compared, demonstrating the advantages and disadvantages of the proposed design with respect to area, delay, power, and energy. We considered both truncated and full-precision Multipliers. The proposed Multiplier has lower delay, area, power, and energy than other considered types of array Multipliers. Its advantages grow with the increase in precision. As expected, it is slower than a tree Multiplier but it has smaller area, power, and energy.