The Experts below are selected from a list of 297 Experts worldwide ranked by ideXlab platform
Yedidya Hilewitz - One of the best experts on this subject based on the ideXlab platform.
-
A New Basis for Shifters in General-Purpose Processors for Existing and Advanced Bit Manipulations
IEEE Transactions on Computers, 2009Co-Authors: Yedidya HilewitzAbstract:This paper describes a new basis for the implementation of the shifter functional unit in microprocessors that can implement new advanced Bit Manipulations as well as standard shifter operations. Our design is based on the inverse butterfly and butterfly data path circuits, rather than the barrel shifter or log-shifter designs currently used. We show how this new shifter can implement the standard shift and rotate operations, as well as more advanced extract, deposit, and mix operations found in some processors. Furthermore, it can perform important new classes of even more advanced Bit Manipulation instructions like arBitrary Bit permutations, Bit gather (or parallel extract), and Bit scatter (or parallel deposit) instructions. Thus, our new functional unit performs the functionality of three functional units-the basic shifter, the multimedia-mix unit, and the advanced Bit Manipulation functional unit, while having a latency only slightly longer than that of the log-shifter. For performing only the existing functions of a shifter, it has significantly smaller area.
-
Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors
Journal of Signal Processing Systems, 2008Co-Authors: Yedidya HilewitzAbstract:Advanced Bit Manipulation operations are not efficiently supported by commodity word-oriented microprocessors. Programming tricks are typically devised to shorten the long sequence of instructions needed to emulate these complicated Bit operations. As these Bit Manipulation operations are relevant to applications that are becoming increasingly important, we propose direct support for them in microprocessors. In particular, we propose fast Bit gather (or parallel extract), Bit scatter (or parallel deposit) and Bit permutation instructions (including group, butterfly and inverse butterfly). We show that all these instructions can be implemented efficiently using both the fast butterfly and inverse butterfly network datapaths. Specifically, we show that parallel deposit can be mapped onto a butterfly circuit and parallel extract can be mapped onto an inverse butterfly circuit. We define static, dynamic and loop invariant versions of the instructions, with static versions utilizing a much simpler functional unit. We show how a hardware decoder can be implemented for the dynamic and loop-invariant versions to generate, dynamically, the control signals for the butterfly and inverse butterfly datapaths. The simplest functional unit we propose is smaller and faster than an ALU. We also show that these instructions yield significant speedups over a basic RISC architecture for a variety of different application kernels taken from applications domains including bioinformatics, steganography, coding, compression and random number generation.
-
Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors
18th IEEE Symposium on Computer Arithmetic (ARITH '07), 2007Co-Authors: Yedidya HilewitzAbstract:This paper describes a new basis for the implementation of a shifter functional unit. We present a design based on the inverse butterfly and butterfly datapath circuits that performs the standard shift and rotate operations, as well as more advanced extract, deposit and mix operations found in some processors. Additionally, it also supports important new classes of even more advanced Bit Manipulation instructions recently proposed: these include arBitrary Bit permutations, Bit scatter and Bit gather instructions. The new functional unit's datapath is comparable in latency to that of the classic barrel shifter. It replaces two existing functional units-shifter and mix-with a much more powerful one.
-
IEEE Symposium on Computer Arithmetic - Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors
18th IEEE Symposium on Computer Arithmetic (ARITH '07), 2007Co-Authors: Yedidya HilewitzAbstract:This paper describes a new basis for the implementation of a shifter functional unit. We present a design based on the inverse butterfly and butterfly datapath circuits that performs the standard shift and rotate operations, as well as more advanced extract, deposit and mix operations found in some processors. Additionally, it also supports important new classes of even more advanced Bit Manipulation instructions recently proposed: these include arBitrary Bit permutations, Bit scatter and Bit gather instructions. The new functional unit's datapath is comparable in latency to that of the classic barrel shifter. It replaces two existing functional units-shifter and mix-with a much more powerful one.
-
ASAP - Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions
IEEE 17th International Conference on Application-specific Systems Architectures and Processors (ASAP'06), 2006Co-Authors: Yedidya HilewitzAbstract:Current microprocessor instruction set architectures are word oriented, with some subword support. Many important applications, however, can realize substantial performance benefits from Bitoriented instructions. We propose the parallel extract (pex) and parallel deposit (pdep) instructions to accelerate compressing and expanding selections of Bits. We show that these instructions can be implemented by the fast inverse butterfly and butterfly network circuits. We evaluate latency and area costs of alternative functional units for implementing subsets of advanced Bit Manipulation instructions. We show applications exhiBiting significant speedup, 3.41? on average over a basic RISC architecture, and 2.48? on average over an instruction set architecture (ISA) that supports extract and deposit instructions.
Myung Hoon Sunwoo - One of the best experts on this subject based on the ideXlab platform.
-
Bit Manipulation Accelerator for Communication Systems Digital Signal Processor
EURASIP Journal on Advances in Signal Processing, 2005Co-Authors: Sug Hyun Jeong, Myung Hoon Sunwoo, Seong K. OhAbstract:This paper proposes application-specific instructions and their Bit Manipulation unit (BMU), which efficiently support scrambling, convolutional encoding, puncturing, interleaving, and Bit stream multiplexing. The proposed DSP employs the BMU supporting parallel shift and XOR (exclusive-OR) operations and Bit insertion/extraction operations on multiple data. The proposed architecture has been modeled by VHDL and synthesized using the SEC 0.18 m standard cell library and the gate count of the BMU is only about 1700 gates. Performance comparisons show that the number of clock cycles can be reduced about for scrambling, convolutional encoding, and interleaving compared with existing DSPs.
-
Novel Bit Manipulation unit for communication digital signal processors
2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512), 2004Co-Authors: Sug Hyun Jeong, Myung Hoon SunwooAbstract:This paper proposes application-specific instructions and their Bit Manipulation unit (BMU), which efficiently support scrambling, convolutional encoding, puncturing, and interleaving. The proposed DSP employs the BMU supporting parallel shift and XOR (Exclusive-OR) operations and Bit insertion/extraction operations on multiple data. The proposed architecture has been modeled by VHDL and synthesized using the SEC 0.18 /spl mu/m standard cell library and the gate count of the BMU is only about 1700 gates. Performance comparisons show that the number of clock cycles can be reduced about 40%/spl sim/80% for scrambling, convolutional encoding and interleaving compared with existing DSPs.
-
ISCAS (2) - Novel Bit Manipulation unit for communication digital signal processors
2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512), 2004Co-Authors: Sug Hyun Jeong, Myung Hoon SunwooAbstract:This paper proposes application-specific instructions and their Bit Manipulation unit (BMU), which efficiently support scrambling, convolutional encoding, puncturing, and interleaving. The proposed DSP employs the BMU supporting parallel shift and XOR (Exclusive-OR) operations and Bit insertion/extraction operations on multiple data. The proposed architecture has been modeled by VHDL and synthesized using the SEC 0.18 /spl mu/m standard cell library and the gate count of the BMU is only about 1700 gates. Performance comparisons show that the number of clock cycles can be reduced about 40%/spl sim/80% for scrambling, convolutional encoding and interleaving compared with existing DSPs.
-
Design of Bit Manipulation accelerator for communication DSP
Proceedings of 2004 IEEE Asia-Pacific Conference on Advanced System Integrated Circuits, 2004Co-Authors: Suk Hyun Yoon, Sug Hyun Jeong, Myung Hoon SunwooAbstract:This paper proposes a Bit Manipulation accelerator (BMA) having application specific instructions, which efficiently supports scrambling, convolutional encoding, puncturing, and interleaving. Conventional DSPs cannot effectively perform Bit Manipulation functions since they have multiply accumulate (MAC) oriented data paths and word-based functions. However, the proposed accelerator can process efficiently Bit Manipulation functions using parallel shift and Exclusive-OR (XOR) operations and Bit insertion/extraction operations on multiple data. The proposed BMA has been modeled by VHDL and logic synthesized using the SEC 0.18 /spl mu/m standard cell library and the gate count of the BMA is only about 1,700 gates. Performance comparisons show that the number of clock cycles can be reduced about 40% /spl sim/ 80% for scrambling, convolutional encoding and interleaving compared with existing DSPs.
Akash Kumar - One of the best experts on this subject based on the ideXlab platform.
-
FPL - Improving autonomous soft-error tolerance of FPGA through LUT configuration Bit Manipulation
2013 23rd International Conference on Field programmable Logic and Applications, 2013Co-Authors: Shyamsundar Venkataraman, Akash KumarAbstract:Soft-errors in LUT configuration Bits of FPGAs can alter the functionality of an implemented design, rendering it useless, unless re-programmed. This paper proposes a technique to improve autonomous fault-masking capabilities of a design by maximizing the number of zeros or ones in LUTs. The technique utilizes spare resources (XOR gates and carry chain) of FPGA devices to selectively manipulate LUT contents using two operations - LUT restructuring and LUT decomposition. Experiments conducted with a wide set of benchmarks from MCNC, IWLS 2005 and ITC99 benchmark suite on Xilinx Virtex 6 FPGA board demonstrate that the proposed methodology maximizes logic 0/1 of LUTs by an average 20% achieving 80% fault-masking with no area overhead. The fault-rate of the entire design is reduced by 60% on average as compared to the existing techniques. Further, an additional 5% fault-masking can be achieved with a 7% increase in slice usage.
-
Improving autonomous soft-error tolerance of FPGA through LUT configuration Bit Manipulation
2013 23rd International Conference on Field programmable Logic and Applications, 2013Co-Authors: Shyamsundar Venkataraman, Akash KumarAbstract:Soft-errors in LUT configuration Bits of FPGAs can alter the functionality of an implemented design, rendering it useless, unless re-programmed. This paper proposes a technique to improve autonomous fault-masking capabilities of a design by maximizing the number of zeros or ones in LUTs. The technique utilizes spare resources (XOR gates and carry chain) of FPGA devices to selectively manipulate LUT contents using two operations - LUT restructuring and LUT decomposition. Experiments conducted with a wide set of benchmarks from MCNC, IWLS 2005 and ITC99 benchmark suite on Xilinx Virtex 6 FPGA board demonstrate that the proposed methodology maximizes logic 0/1 of LUTs by an average 20% achieving 80% fault-masking with no area overhead. The fault-rate of the entire design is reduced by 60% on average as compared to the existing techniques. Further, an additional 5% fault-masking can be achieved with a 7% increase in slice usage.
Chris Johnson - One of the best experts on this subject based on the ideXlab platform.
-
a game driven approach to teaching Bit Manipulation abstract only
Technical Symposium on Computer Science Education, 2017Co-Authors: Paul Voelker, Chris JohnsonAbstract:The use of educational games to teach and reinforce concepts to students is an idea that has gained popularity in recent years. Games force students to demonstrate their mastery of a subject by applying its principles to complete a goal or solve a problem. Games also offer more frequent feedback on the student's performance along with immediate rewards. These factors can make games more engaging for the student than traditional homework or quizzes. In this poster, the authors present a program which hopes to leverage the advantages games have as a learning tool in order to help students understand the effects of Bit Manipulation. The player controls a factory with a series of pipes that dispense chocolate into trucks waiting below. Using Bitwise operators, the player must manipulate which pipes are open and closed in order to ensure that a pipe is only open if there is a truck aligned beneath it. The player is offered immediate feedback on their performance in the form of empty trucks driving away or wasted chocolate splashing to the ground. Additional challenge can be added to the game by only allowing the player to adjust the pipes one time between each set of trucks. By providing immediate feedback and encouraging creative problem solving, this game may improve student's intuition about the mechanics underlying Bit Manipulation.
-
SIGCSE - A Game-Driven Approach to Teaching Bit Manipulation (Abstract Only)
Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education, 2017Co-Authors: Paul Voelker, Chris JohnsonAbstract:The use of educational games to teach and reinforce concepts to students is an idea that has gained popularity in recent years. Games force students to demonstrate their mastery of a subject by applying its principles to complete a goal or solve a problem. Games also offer more frequent feedback on the student's performance along with immediate rewards. These factors can make games more engaging for the student than traditional homework or quizzes. In this poster, the authors present a program which hopes to leverage the advantages games have as a learning tool in order to help students understand the effects of Bit Manipulation. The player controls a factory with a series of pipes that dispense chocolate into trucks waiting below. Using Bitwise operators, the player must manipulate which pipes are open and closed in order to ensure that a pipe is only open if there is a truck aligned beneath it. The player is offered immediate feedback on their performance in the form of empty trucks driving away or wasted chocolate splashing to the ground. Additional challenge can be added to the game by only allowing the player to adjust the pipes one time between each set of trucks. By providing immediate feedback and encouraging creative problem solving, this game may improve student's intuition about the mechanics underlying Bit Manipulation.
M. Re - One of the best experts on this subject based on the ideXlab platform.
-
ICECS - TDES cryptography algorithm acceleration using a reconfigurable functional unit
2014 21st IEEE International Conference on Electronics Circuits and Systems (ICECS), 2014Co-Authors: G.c. Cardarilli, L. Di Nunzio, R. Fazzolari, M. ReAbstract:Many cryptography algorithm contain a lots of data Bit Manipulation operations. Unfortunately, the Instruction Set Architecure (ISA) of general purpose microprocessors is usually word oriented. Consequently the execution of this kind of algorithms is not optimized and the computation of data represented by single Bits or sub-words can require several clock cycles. Reconfigurable hardware accelerators oriented to the Bit Manipulation could accelerate the computation of these algorithms increasing the microprocessor performance in terms of execution time. This work presents the experimental results of the speed-up factor obtained for the implementation of TDES (Triple Data Encryption Standard) algorithm when a Reconfigurable Functional Unit ADAPTO [1] is integrated with a RISC microprocessor (the Altera NIOS-II soft processor [2]). The ADAPTO unit, described in VHDL (VHSIC Hardware Description Language), has been implemented on an Altera-Stratix II FPGA and integrated with the Nios soft processor using the Custom Logic feature [4]. The objective is the measurement of the speed-up factor related to the introduction of the reconfigurable hardware accelerator.
-
A Reconfigurable Functional Unit for Modular Operations
Lecture Notes in Electrical Engineering, 2014Co-Authors: G.c. Cardarilli, L. Di Nunzio, R. Fazzolari, S. Pontarelli, M. ReAbstract:The efficiency of standard microprocessors decreases when operations on short data are performed because they are optimized to perform operations on fixed size data. Short data processing and Bit Manipulation can be accelerated integrating a Reconfigurable Functional Unit (RFU ) in parallel with the ALU. An RFU is a tightly coupled integrated Reconfigurable Array used to speed-up the computation of a set of operations for which standard microprocessors are not optimized. In this paper we show the benefit of using the Adder-based Dynamic Architecture for Processing Tailored Operators (ADAPTO RFU) [1, 2, 3] (a full adder based RFU) on modular operations. In particular we describe how to speed up the modular addition and the Montgomery Multiplication by using the ADAPTO RFU.
-
ACSCC - Integration of butterfly and inverse butterfly nets in embedded processors: Effects on power saving
2012 Conference Record of the Forty Sixth Asilomar Conference on Signals Systems and Computers (ASILOMAR), 2012Co-Authors: G.c. Cardarilli, L. Di Nunzio, R. Fazzolari, M. ReAbstract:Many software functions are not efficiently executed by standard microprocessors. This happens when the operation granularity and data wordlength are different with respect to those of the microprocessor's architecture. Important improvements in speed and power can be obtained by integrating hardware accelerators in standard microprocessor architectures. This work, based on [1], shows that the integration of a Bit Manipulation Unit (BMU) [2] in an Altera NIOS-2 soft processor architecture [3] allows very interesting speed-up and power saving factors.
-
Algorithm acceleration on LEON-2 processor using a reconfigurable Bit Manipulation unit
2010 8th Workshop on Intelligent Solutions in Embedded Systems, 2010Co-Authors: G.c. Cardarilli, L. Di Nunzio, R. Fazzolari, M. ReAbstract:Advanced Bit Manipulation operations are not efficiently supported by standard microprocessors since they are optimized for fixed data size operations. In literature several hardware solutions are proposed to overcome this problem, and. In this work we present the experimental results of a new architecture based on LEON-2 and a simplified version of ADAPTO (Adder-based Dynamic Architecture for Processing Tailored Operators), acting as a co-processor. For our experiments we run a set of Bit Manipulation Algorithms on the LEON-2 processor in presence and absence of the ADAPTO unit. This permits to measure the speed-up factor obtained using the proposed reconfigurable co-processor.
-
WISES - Algorithm acceleration on LEON-2 processor using a reconfigurable Bit Manipulation unit
2010 8th Workshop on Intelligent Solutions in Embedded Systems, 2010Co-Authors: G.c. Cardarilli, L. Di Nunzio, R. Fazzolari, M. ReAbstract:Advanced Bit Manipulation operations are not efficiently supported by standard microprocessors since they are optimized for fixed data size operations. In literature several hardware solutions are proposed to overcome this problem [1], [3] and [4]. In this work we present the experimental results of a new architecture based on LEON-2 and a simplified version of ADAPTO [1] (Adder-based Dynamic Architecture for Processing Tailored Operators), acting as a co-processor. For our experiments we run a set of Bit Manipulation Algorithms on the LEON-2 processor in presence and absence of the ADAPTO unit. This permits to measure the speed-up factor obtained using the proposed reconfigurable co-processor.