The Experts below are selected from a list of 297 Experts worldwide ranked by ideXlab platform

Yedidya Hilewitz - One of the best experts on this subject based on the ideXlab platform.

  • A New Basis for Shifters in General-Purpose Processors for Existing and Advanced Bit Manipulations
    IEEE Transactions on Computers, 2009
    Co-Authors: Yedidya Hilewitz
    Abstract:

    This paper describes a new basis for the implementation of the shifter functional unit in microprocessors that can implement new advanced Bit Manipulations as well as standard shifter operations. Our design is based on the inverse butterfly and butterfly data path circuits, rather than the barrel shifter or log-shifter designs currently used. We show how this new shifter can implement the standard shift and rotate operations, as well as more advanced extract, deposit, and mix operations found in some processors. Furthermore, it can perform important new classes of even more advanced Bit Manipulation instructions like arBitrary Bit permutations, Bit gather (or parallel extract), and Bit scatter (or parallel deposit) instructions. Thus, our new functional unit performs the functionality of three functional units-the basic shifter, the multimedia-mix unit, and the advanced Bit Manipulation functional unit, while having a latency only slightly longer than that of the log-shifter. For performing only the existing functions of a shifter, it has significantly smaller area.

  • Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors
    Journal of Signal Processing Systems, 2008
    Co-Authors: Yedidya Hilewitz
    Abstract:

    Advanced Bit Manipulation operations are not efficiently supported by commodity word-oriented microprocessors. Programming tricks are typically devised to shorten the long sequence of instructions needed to emulate these complicated Bit operations. As these Bit Manipulation operations are relevant to applications that are becoming increasingly important, we propose direct support for them in microprocessors. In particular, we propose fast Bit gather (or parallel extract), Bit scatter (or parallel deposit) and Bit permutation instructions (including group, butterfly and inverse butterfly). We show that all these instructions can be implemented efficiently using both the fast butterfly and inverse butterfly network datapaths. Specifically, we show that parallel deposit can be mapped onto a butterfly circuit and parallel extract can be mapped onto an inverse butterfly circuit. We define static, dynamic and loop invariant versions of the instructions, with static versions utilizing a much simpler functional unit. We show how a hardware decoder can be implemented for the dynamic and loop-invariant versions to generate, dynamically, the control signals for the butterfly and inverse butterfly datapaths. The simplest functional unit we propose is smaller and faster than an ALU. We also show that these instructions yield significant speedups over a basic RISC architecture for a variety of different application kernels taken from applications domains including bioinformatics, steganography, coding, compression and random number generation.

  • Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors
    18th IEEE Symposium on Computer Arithmetic (ARITH '07), 2007
    Co-Authors: Yedidya Hilewitz
    Abstract:

    This paper describes a new basis for the implementation of a shifter functional unit. We present a design based on the inverse butterfly and butterfly datapath circuits that performs the standard shift and rotate operations, as well as more advanced extract, deposit and mix operations found in some processors. Additionally, it also supports important new classes of even more advanced Bit Manipulation instructions recently proposed: these include arBitrary Bit permutations, Bit scatter and Bit gather instructions. The new functional unit's datapath is comparable in latency to that of the classic barrel shifter. It replaces two existing functional units-shifter and mix-with a much more powerful one.

  • IEEE Symposium on Computer Arithmetic - Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors
    18th IEEE Symposium on Computer Arithmetic (ARITH '07), 2007
    Co-Authors: Yedidya Hilewitz
    Abstract:

    This paper describes a new basis for the implementation of a shifter functional unit. We present a design based on the inverse butterfly and butterfly datapath circuits that performs the standard shift and rotate operations, as well as more advanced extract, deposit and mix operations found in some processors. Additionally, it also supports important new classes of even more advanced Bit Manipulation instructions recently proposed: these include arBitrary Bit permutations, Bit scatter and Bit gather instructions. The new functional unit's datapath is comparable in latency to that of the classic barrel shifter. It replaces two existing functional units-shifter and mix-with a much more powerful one.

  • ASAP - Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions
    IEEE 17th International Conference on Application-specific Systems Architectures and Processors (ASAP'06), 2006
    Co-Authors: Yedidya Hilewitz
    Abstract:

    Current microprocessor instruction set architectures are word oriented, with some subword support. Many important applications, however, can realize substantial performance benefits from Bitoriented instructions. We propose the parallel extract (pex) and parallel deposit (pdep) instructions to accelerate compressing and expanding selections of Bits. We show that these instructions can be implemented by the fast inverse butterfly and butterfly network circuits. We evaluate latency and area costs of alternative functional units for implementing subsets of advanced Bit Manipulation instructions. We show applications exhiBiting significant speedup, 3.41? on average over a basic RISC architecture, and 2.48? on average over an instruction set architecture (ISA) that supports extract and deposit instructions.

Myung Hoon Sunwoo - One of the best experts on this subject based on the ideXlab platform.

  • Bit Manipulation Accelerator for Communication Systems Digital Signal Processor
    EURASIP Journal on Advances in Signal Processing, 2005
    Co-Authors: Sug Hyun Jeong, Myung Hoon Sunwoo, Seong K. Oh
    Abstract:

    This paper proposes application-specific instructions and their Bit Manipulation unit (BMU), which efficiently support scrambling, convolutional encoding, puncturing, interleaving, and Bit stream multiplexing. The proposed DSP employs the BMU supporting parallel shift and XOR (exclusive-OR) operations and Bit insertion/extraction operations on multiple data. The proposed architecture has been modeled by VHDL and synthesized using the SEC 0.18 m standard cell library and the gate count of the BMU is only about 1700 gates. Performance comparisons show that the number of clock cycles can be reduced about for scrambling, convolutional encoding, and interleaving compared with existing DSPs.

  • Novel Bit Manipulation unit for communication digital signal processors
    2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512), 2004
    Co-Authors: Sug Hyun Jeong, Myung Hoon Sunwoo
    Abstract:

    This paper proposes application-specific instructions and their Bit Manipulation unit (BMU), which efficiently support scrambling, convolutional encoding, puncturing, and interleaving. The proposed DSP employs the BMU supporting parallel shift and XOR (Exclusive-OR) operations and Bit insertion/extraction operations on multiple data. The proposed architecture has been modeled by VHDL and synthesized using the SEC 0.18 /spl mu/m standard cell library and the gate count of the BMU is only about 1700 gates. Performance comparisons show that the number of clock cycles can be reduced about 40%/spl sim/80% for scrambling, convolutional encoding and interleaving compared with existing DSPs.

  • ISCAS (2) - Novel Bit Manipulation unit for communication digital signal processors
    2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512), 2004
    Co-Authors: Sug Hyun Jeong, Myung Hoon Sunwoo
    Abstract:

    This paper proposes application-specific instructions and their Bit Manipulation unit (BMU), which efficiently support scrambling, convolutional encoding, puncturing, and interleaving. The proposed DSP employs the BMU supporting parallel shift and XOR (Exclusive-OR) operations and Bit insertion/extraction operations on multiple data. The proposed architecture has been modeled by VHDL and synthesized using the SEC 0.18 /spl mu/m standard cell library and the gate count of the BMU is only about 1700 gates. Performance comparisons show that the number of clock cycles can be reduced about 40%/spl sim/80% for scrambling, convolutional encoding and interleaving compared with existing DSPs.

  • Design of Bit Manipulation accelerator for communication DSP
    Proceedings of 2004 IEEE Asia-Pacific Conference on Advanced System Integrated Circuits, 2004
    Co-Authors: Suk Hyun Yoon, Sug Hyun Jeong, Myung Hoon Sunwoo
    Abstract:

    This paper proposes a Bit Manipulation accelerator (BMA) having application specific instructions, which efficiently supports scrambling, convolutional encoding, puncturing, and interleaving. Conventional DSPs cannot effectively perform Bit Manipulation functions since they have multiply accumulate (MAC) oriented data paths and word-based functions. However, the proposed accelerator can process efficiently Bit Manipulation functions using parallel shift and Exclusive-OR (XOR) operations and Bit insertion/extraction operations on multiple data. The proposed BMA has been modeled by VHDL and logic synthesized using the SEC 0.18 /spl mu/m standard cell library and the gate count of the BMA is only about 1,700 gates. Performance comparisons show that the number of clock cycles can be reduced about 40% /spl sim/ 80% for scrambling, convolutional encoding and interleaving compared with existing DSPs.

Akash Kumar - One of the best experts on this subject based on the ideXlab platform.

  • FPL - Improving autonomous soft-error tolerance of FPGA through LUT configuration Bit Manipulation
    2013 23rd International Conference on Field programmable Logic and Applications, 2013
    Co-Authors: Shyamsundar Venkataraman, Akash Kumar
    Abstract:

    Soft-errors in LUT configuration Bits of FPGAs can alter the functionality of an implemented design, rendering it useless, unless re-programmed. This paper proposes a technique to improve autonomous fault-masking capabilities of a design by maximizing the number of zeros or ones in LUTs. The technique utilizes spare resources (XOR gates and carry chain) of FPGA devices to selectively manipulate LUT contents using two operations - LUT restructuring and LUT decomposition. Experiments conducted with a wide set of benchmarks from MCNC, IWLS 2005 and ITC99 benchmark suite on Xilinx Virtex 6 FPGA board demonstrate that the proposed methodology maximizes logic 0/1 of LUTs by an average 20% achieving 80% fault-masking with no area overhead. The fault-rate of the entire design is reduced by 60% on average as compared to the existing techniques. Further, an additional 5% fault-masking can be achieved with a 7% increase in slice usage.

  • Improving autonomous soft-error tolerance of FPGA through LUT configuration Bit Manipulation
    2013 23rd International Conference on Field programmable Logic and Applications, 2013
    Co-Authors: Shyamsundar Venkataraman, Akash Kumar
    Abstract:

    Soft-errors in LUT configuration Bits of FPGAs can alter the functionality of an implemented design, rendering it useless, unless re-programmed. This paper proposes a technique to improve autonomous fault-masking capabilities of a design by maximizing the number of zeros or ones in LUTs. The technique utilizes spare resources (XOR gates and carry chain) of FPGA devices to selectively manipulate LUT contents using two operations - LUT restructuring and LUT decomposition. Experiments conducted with a wide set of benchmarks from MCNC, IWLS 2005 and ITC99 benchmark suite on Xilinx Virtex 6 FPGA board demonstrate that the proposed methodology maximizes logic 0/1 of LUTs by an average 20% achieving 80% fault-masking with no area overhead. The fault-rate of the entire design is reduced by 60% on average as compared to the existing techniques. Further, an additional 5% fault-masking can be achieved with a 7% increase in slice usage.

Chris Johnson - One of the best experts on this subject based on the ideXlab platform.

  • a game driven approach to teaching Bit Manipulation abstract only
    Technical Symposium on Computer Science Education, 2017
    Co-Authors: Paul Voelker, Chris Johnson
    Abstract:

    The use of educational games to teach and reinforce concepts to students is an idea that has gained popularity in recent years. Games force students to demonstrate their mastery of a subject by applying its principles to complete a goal or solve a problem. Games also offer more frequent feedback on the student's performance along with immediate rewards. These factors can make games more engaging for the student than traditional homework or quizzes. In this poster, the authors present a program which hopes to leverage the advantages games have as a learning tool in order to help students understand the effects of Bit Manipulation. The player controls a factory with a series of pipes that dispense chocolate into trucks waiting below. Using Bitwise operators, the player must manipulate which pipes are open and closed in order to ensure that a pipe is only open if there is a truck aligned beneath it. The player is offered immediate feedback on their performance in the form of empty trucks driving away or wasted chocolate splashing to the ground. Additional challenge can be added to the game by only allowing the player to adjust the pipes one time between each set of trucks. By providing immediate feedback and encouraging creative problem solving, this game may improve student's intuition about the mechanics underlying Bit Manipulation.

  • SIGCSE - A Game-Driven Approach to Teaching Bit Manipulation (Abstract Only)
    Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education, 2017
    Co-Authors: Paul Voelker, Chris Johnson
    Abstract:

    The use of educational games to teach and reinforce concepts to students is an idea that has gained popularity in recent years. Games force students to demonstrate their mastery of a subject by applying its principles to complete a goal or solve a problem. Games also offer more frequent feedback on the student's performance along with immediate rewards. These factors can make games more engaging for the student than traditional homework or quizzes. In this poster, the authors present a program which hopes to leverage the advantages games have as a learning tool in order to help students understand the effects of Bit Manipulation. The player controls a factory with a series of pipes that dispense chocolate into trucks waiting below. Using Bitwise operators, the player must manipulate which pipes are open and closed in order to ensure that a pipe is only open if there is a truck aligned beneath it. The player is offered immediate feedback on their performance in the form of empty trucks driving away or wasted chocolate splashing to the ground. Additional challenge can be added to the game by only allowing the player to adjust the pipes one time between each set of trucks. By providing immediate feedback and encouraging creative problem solving, this game may improve student's intuition about the mechanics underlying Bit Manipulation.

M. Re - One of the best experts on this subject based on the ideXlab platform.

  • ICECS - TDES cryptography algorithm acceleration using a reconfigurable functional unit
    2014 21st IEEE International Conference on Electronics Circuits and Systems (ICECS), 2014
    Co-Authors: G.c. Cardarilli, L. Di Nunzio, R. Fazzolari, M. Re
    Abstract:

    Many cryptography algorithm contain a lots of data Bit Manipulation operations. Unfortunately, the Instruction Set Architecure (ISA) of general purpose microprocessors is usually word oriented. Consequently the execution of this kind of algorithms is not optimized and the computation of data represented by single Bits or sub-words can require several clock cycles. Reconfigurable hardware accelerators oriented to the Bit Manipulation could accelerate the computation of these algorithms increasing the microprocessor performance in terms of execution time. This work presents the experimental results of the speed-up factor obtained for the implementation of TDES (Triple Data Encryption Standard) algorithm when a Reconfigurable Functional Unit ADAPTO [1] is integrated with a RISC microprocessor (the Altera NIOS-II soft processor [2]). The ADAPTO unit, described in VHDL (VHSIC Hardware Description Language), has been implemented on an Altera-Stratix II FPGA and integrated with the Nios soft processor using the Custom Logic feature [4]. The objective is the measurement of the speed-up factor related to the introduction of the reconfigurable hardware accelerator.

  • A Reconfigurable Functional Unit for Modular Operations
    Lecture Notes in Electrical Engineering, 2014
    Co-Authors: G.c. Cardarilli, L. Di Nunzio, R. Fazzolari, S. Pontarelli, M. Re
    Abstract:

    The efficiency of standard microprocessors decreases when operations on short data are performed because they are optimized to perform operations on fixed size data. Short data processing and Bit Manipulation can be accelerated integrating a Reconfigurable Functional Unit (RFU ) in parallel with the ALU. An RFU is a tightly coupled integrated Reconfigurable Array used to speed-up the computation of a set of operations for which standard microprocessors are not optimized. In this paper we show the benefit of using the Adder-based Dynamic Architecture for Processing Tailored Operators (ADAPTO RFU) [1, 2, 3] (a full adder based RFU) on modular operations. In particular we describe how to speed up the modular addition and the Montgomery Multiplication by using the ADAPTO RFU.

  • ACSCC - Integration of butterfly and inverse butterfly nets in embedded processors: Effects on power saving
    2012 Conference Record of the Forty Sixth Asilomar Conference on Signals Systems and Computers (ASILOMAR), 2012
    Co-Authors: G.c. Cardarilli, L. Di Nunzio, R. Fazzolari, M. Re
    Abstract:

    Many software functions are not efficiently executed by standard microprocessors. This happens when the operation granularity and data wordlength are different with respect to those of the microprocessor's architecture. Important improvements in speed and power can be obtained by integrating hardware accelerators in standard microprocessor architectures. This work, based on [1], shows that the integration of a Bit Manipulation Unit (BMU) [2] in an Altera NIOS-2 soft processor architecture [3] allows very interesting speed-up and power saving factors.

  • Algorithm acceleration on LEON-2 processor using a reconfigurable Bit Manipulation unit
    2010 8th Workshop on Intelligent Solutions in Embedded Systems, 2010
    Co-Authors: G.c. Cardarilli, L. Di Nunzio, R. Fazzolari, M. Re
    Abstract:

    Advanced Bit Manipulation operations are not efficiently supported by standard microprocessors since they are optimized for fixed data size operations. In literature several hardware solutions are proposed to overcome this problem, and. In this work we present the experimental results of a new architecture based on LEON-2 and a simplified version of ADAPTO (Adder-based Dynamic Architecture for Processing Tailored Operators), acting as a co-processor. For our experiments we run a set of Bit Manipulation Algorithms on the LEON-2 processor in presence and absence of the ADAPTO unit. This permits to measure the speed-up factor obtained using the proposed reconfigurable co-processor.

  • WISES - Algorithm acceleration on LEON-2 processor using a reconfigurable Bit Manipulation unit
    2010 8th Workshop on Intelligent Solutions in Embedded Systems, 2010
    Co-Authors: G.c. Cardarilli, L. Di Nunzio, R. Fazzolari, M. Re
    Abstract:

    Advanced Bit Manipulation operations are not efficiently supported by standard microprocessors since they are optimized for fixed data size operations. In literature several hardware solutions are proposed to overcome this problem [1], [3] and [4]. In this work we present the experimental results of a new architecture based on LEON-2 and a simplified version of ADAPTO [1] (Adder-based Dynamic Architecture for Processing Tailored Operators), acting as a co-processor. For our experiments we run a set of Bit Manipulation Algorithms on the LEON-2 processor in presence and absence of the ADAPTO unit. This permits to measure the speed-up factor obtained using the proposed reconfigurable co-processor.