Bit Manipulation - Explore the Science & Experts

The Experts below are selected from a list of 297 Experts worldwide ranked by ideXlab platform

Yedidya Hilewitz - One of the best experts on this subject based on the ideXlab platform.

A New Basis for Shifters in General-Purpose Processors for Existing and Advanced Bit Manipulations

IEEE Transactions on Computers, 2009

Co-Authors: Yedidya Hilewitz

Abstract:

This paper describes a new basis for the implementation of the shifter functional unit in microprocessors that can implement new advanced Bit Manipulations as well as standard shifter operations. Our design is based on the inverse butterfly and butterfly data path circuits, rather than the barrel shifter or log-shifter designs currently used. We show how this new shifter can implement the standard shift and rotate operations, as well as more advanced extract, deposit, and mix operations found in some processors. Furthermore, it can perform important new classes of even more advanced Bit Manipulation instructions like arBitrary Bit permutations, Bit gather (or parallel extract), and Bit scatter (or parallel deposit) instructions. Thus, our new functional unit performs the functionality of three functional units-the basic shifter, the multimedia-mix unit, and the advanced Bit Manipulation functional unit, while having a latency only slightly longer than that of the log-shifter. For performing only the existing functions of a shifter, it has significantly smaller area.

15 days free trial to Access Article
Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors

Journal of Signal Processing Systems, 2008

Co-Authors: Yedidya Hilewitz

Abstract:

Advanced Bit Manipulation operations are not efficiently supported by commodity word-oriented microprocessors. Programming tricks are typically devised to shorten the long sequence of instructions needed to emulate these complicated Bit operations. As these Bit Manipulation operations are relevant to applications that are becoming increasingly important, we propose direct support for them in microprocessors. In particular, we propose fast Bit gather (or parallel extract), Bit scatter (or parallel deposit) and Bit permutation instructions (including group, butterfly and inverse butterfly). We show that all these instructions can be implemented efficiently using both the fast butterfly and inverse butterfly network datapaths. Specifically, we show that parallel deposit can be mapped onto a butterfly circuit and parallel extract can be mapped onto an inverse butterfly circuit. We define static, dynamic and loop invariant versions of the instructions, with static versions utilizing a much simpler functional unit. We show how a hardware decoder can be implemented for the dynamic and loop-invariant versions to generate, dynamically, the control signals for the butterfly and inverse butterfly datapaths. The simplest functional unit we propose is smaller and faster than an ALU. We also show that these instructions yield significant speedups over a basic RISC architecture for a variety of different application kernels taken from applications domains including bioinformatics, steganography, coding, compression and random number generation.

15 days free trial to Access Article
Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors

18th IEEE Symposium on Computer Arithmetic (ARITH '07), 2007

Co-Authors: Yedidya Hilewitz

Abstract:

This paper describes a new basis for the implementation of a shifter functional unit. We present a design based on the inverse butterfly and butterfly datapath circuits that performs the standard shift and rotate operations, as well as more advanced extract, deposit and mix operations found in some processors. Additionally, it also supports important new classes of even more advanced Bit Manipulation instructions recently proposed: these include arBitrary Bit permutations, Bit scatter and Bit gather instructions. The new functional unit's datapath is comparable in latency to that of the classic barrel shifter. It replaces two existing functional units-shifter and mix-with a much more powerful one.

15 days free trial to Access Article
IEEE Symposium on Computer Arithmetic - Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors

18th IEEE Symposium on Computer Arithmetic (ARITH '07), 2007

Co-Authors: Yedidya Hilewitz

Abstract:

This paper describes a new basis for the implementation of a shifter functional unit. We present a design based on the inverse butterfly and butterfly datapath circuits that performs the standard shift and rotate operations, as well as more advanced extract, deposit and mix operations found in some processors. Additionally, it also supports important new classes of even more advanced Bit Manipulation instructions recently proposed: these include arBitrary Bit permutations, Bit scatter and Bit gather instructions. The new functional unit's datapath is comparable in latency to that of the classic barrel shifter. It replaces two existing functional units-shifter and mix-with a much more powerful one.

15 days free trial to Access Article
ASAP - Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions

IEEE 17th International Conference on Application-specific Systems Architectures and Processors (ASAP'06), 2006

Co-Authors: Yedidya Hilewitz

Abstract:

Current microprocessor instruction set architectures are word oriented, with some subword support. Many important applications, however, can realize substantial performance benefits from Bitoriented instructions. We propose the parallel extract (pex) and parallel deposit (pdep) instructions to accelerate compressing and expanding selections of Bits. We show that these instructions can be implemented by the fast inverse butterfly and butterfly network circuits. We evaluate latency and area costs of alternative functional units for implementing subsets of advanced Bit Manipulation instructions. We show applications exhiBiting significant speedup, 3.41? on average over a basic RISC architecture, and 2.48? on average over an instruction set architecture (ISA) that supports extract and deposit instructions.

15 days free trial to Access Article

Myung Hoon Sunwoo - One of the best experts on this subject based on the ideXlab platform.

Bit Manipulation Accelerator for Communication Systems Digital Signal Processor

EURASIP Journal on Advances in Signal Processing, 2005

Co-Authors: Sug Hyun Jeong, Myung Hoon Sunwoo, Seong K. Oh

Abstract:

This paper proposes application-specific instructions and their Bit Manipulation unit (BMU), which efficiently support scrambling, convolutional encoding, puncturing, interleaving, and Bit stream multiplexing. The proposed DSP employs the BMU supporting parallel shift and XOR (exclusive-OR) operations and Bit insertion/extraction operations on multiple data. The proposed architecture has been modeled by VHDL and synthesized using the SEC 0.18 m standard cell library and the gate count of the BMU is only about 1700 gates. Performance comparisons show that the number of clock cycles can be reduced about for scrambling, convolutional encoding, and interleaving compared with existing DSPs.

15 days free trial to Access Article
Novel Bit Manipulation unit for communication digital signal processors

2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512), 2004

Co-Authors: Sug Hyun Jeong, Myung Hoon Sunwoo

Abstract:

This paper proposes application-specific instructions and their Bit Manipulation unit (BMU), which efficiently support scrambling, convolutional encoding, puncturing, and interleaving. The proposed DSP employs the BMU supporting parallel shift and XOR (Exclusive-OR) operations and Bit insertion/extraction operations on multiple data. The proposed architecture has been modeled by VHDL and synthesized using the SEC 0.18 /spl mu/m standard cell library and the gate count of the BMU is only about 1700 gates. Performance comparisons show that the number of clock cycles can be reduced about 40%/spl sim/80% for scrambling, convolutional encoding and interleaving compared with existing DSPs.

15 days free trial to Access Article
ISCAS (2) - Novel Bit Manipulation unit for communication digital signal processors

2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512), 2004

Co-Authors: Sug Hyun Jeong, Myung Hoon Sunwoo

Abstract:

This paper proposes application-specific instructions and their Bit Manipulation unit (BMU), which efficiently support scrambling, convolutional encoding, puncturing, and interleaving. The proposed DSP employs the BMU supporting parallel shift and XOR (Exclusive-OR) operations and Bit insertion/extraction operations on multiple data. The proposed architecture has been modeled by VHDL and synthesized using the SEC 0.18 /spl mu/m standard cell library and the gate count of the BMU is only about 1700 gates. Performance comparisons show that the number of clock cycles can be reduced about 40%/spl sim/80% for scrambling, convolutional encoding and interleaving compared with existing DSPs.

15 days free trial to Access Article
Design of Bit Manipulation accelerator for communication DSP

Proceedings of 2004 IEEE Asia-Pacific Conference on Advanced System Integrated Circuits, 2004

Co-Authors: Suk Hyun Yoon, Sug Hyun Jeong, Myung Hoon Sunwoo

Abstract:

This paper proposes a Bit Manipulation accelerator (BMA) having application specific instructions, which efficiently supports scrambling, convolutional encoding, puncturing, and interleaving. Conventional DSPs cannot effectively perform Bit Manipulation functions since they have multiply accumulate (MAC) oriented data paths and word-based functions. However, the proposed accelerator can process efficiently Bit Manipulation functions using parallel shift and Exclusive-OR (XOR) operations and Bit insertion/extraction operations on multiple data. The proposed BMA has been modeled by VHDL and logic synthesized using the SEC 0.18 /spl mu/m standard cell library and the gate count of the BMA is only about 1,700 gates. Performance comparisons show that the number of clock cycles can be reduced about 40% /spl sim/ 80% for scrambling, convolutional encoding and interleaving compared with existing DSPs.

15 days free trial to Access Article

Akash Kumar - One of the best experts on this subject based on the ideXlab platform.

FPL - Improving autonomous soft-error tolerance of FPGA through LUT configuration Bit Manipulation

2013 23rd International Conference on Field programmable Logic and Applications, 2013

Co-Authors: Shyamsundar Venkataraman, Akash Kumar

Abstract:

Soft-errors in LUT configuration Bits of FPGAs can alter the functionality of an implemented design, rendering it useless, unless re-programmed. This paper proposes a technique to improve autonomous fault-masking capabilities of a design by maximizing the number of zeros or ones in LUTs. The technique utilizes spare resources (XOR gates and carry chain) of FPGA devices to selectively manipulate LUT contents using two operations - LUT restructuring and LUT decomposition. Experiments conducted with a wide set of benchmarks from MCNC, IWLS 2005 and ITC99 benchmark suite on Xilinx Virtex 6 FPGA board demonstrate that the proposed methodology maximizes logic 0/1 of LUTs by an average 20% achieving 80% fault-masking with no area overhead. The fault-rate of the entire design is reduced by 60% on average as compared to the existing techniques. Further, an additional 5% fault-masking can be achieved with a 7% increase in slice usage.

15 days free trial to Access Article
Improving autonomous soft-error tolerance of FPGA through LUT configuration Bit Manipulation

2013 23rd International Conference on Field programmable Logic and Applications, 2013

Co-Authors: Shyamsundar Venkataraman, Akash Kumar

Abstract:

Soft-errors in LUT configuration Bits of FPGAs can alter the functionality of an implemented design, rendering it useless, unless re-programmed. This paper proposes a technique to improve autonomous fault-masking capabilities of a design by maximizing the number of zeros or ones in LUTs. The technique utilizes spare resources (XOR gates and carry chain) of FPGA devices to selectively manipulate LUT contents using two operations - LUT restructuring and LUT decomposition. Experiments conducted with a wide set of benchmarks from MCNC, IWLS 2005 and ITC99 benchmark suite on Xilinx Virtex 6 FPGA board demonstrate that the proposed methodology maximizes logic 0/1 of LUTs by an average 20% achieving 80% fault-masking with no area overhead. The fault-rate of the entire design is reduced by 60% on average as compared to the existing techniques. Further, an additional 5% fault-masking can be achieved with a 7% increase in slice usage.

15 days free trial to Access Article

Chris Johnson - One of the best experts on this subject based on the ideXlab platform.

a game driven approach to teaching Bit Manipulation abstract only

Technical Symposium on Computer Science Education, 2017

Co-Authors: Paul Voelker, Chris Johnson

Abstract:

The use of educational games to teach and reinforce concepts to students is an idea that has gained popularity in recent years. Games force students to demonstrate their mastery of a subject by applying its principles to complete a goal or solve a problem. Games also offer more frequent feedback on the student's performance along with immediate rewards. These factors can make games more engaging for the student than traditional homework or quizzes. In this poster, the authors present a program which hopes to leverage the advantages games have as a learning tool in order to help students understand the effects of Bit Manipulation. The player controls a factory with a series of pipes that dispense chocolate into trucks waiting below. Using Bitwise operators, the player must manipulate which pipes are open and closed in order to ensure that a pipe is only open if there is a truck aligned beneath it. The player is offered immediate feedback on their performance in the form of empty trucks driving away or wasted chocolate splashing to the ground. Additional challenge can be added to the game by only allowing the player to adjust the pipes one time between each set of trucks. By providing immediate feedback and encouraging creative problem solving, this game may improve student's intuition about the mechanics underlying Bit Manipulation.

15 days free trial to Access Article
SIGCSE - A Game-Driven Approach to Teaching Bit Manipulation (Abstract Only)

Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education, 2017

Co-Authors: Paul Voelker, Chris Johnson

Abstract:

The use of educational games to teach and reinforce concepts to students is an idea that has gained popularity in recent years. Games force students to demonstrate their mastery of a subject by applying its principles to complete a goal or solve a problem. Games also offer more frequent feedback on the student's performance along with immediate rewards. These factors can make games more engaging for the student than traditional homework or quizzes. In this poster, the authors present a program which hopes to leverage the advantages games have as a learning tool in order to help students understand the effects of Bit Manipulation. The player controls a factory with a series of pipes that dispense chocolate into trucks waiting below. Using Bitwise operators, the player must manipulate which pipes are open and closed in order to ensure that a pipe is only open if there is a truck aligned beneath it. The player is offered immediate feedback on their performance in the form of empty trucks driving away or wasted chocolate splashing to the ground. Additional challenge can be added to the game by only allowing the player to adjust the pipes one time between each set of trucks. By providing immediate feedback and encouraging creative problem solving, this game may improve student's intuition about the mechanics underlying Bit Manipulation.

15 days free trial to Access Article

M. Re - One of the best experts on this subject based on the ideXlab platform.

ICECS - TDES cryptography algorithm acceleration using a reconfigurable functional unit

2014 21st IEEE International Conference on Electronics Circuits and Systems (ICECS), 2014

Co-Authors: G.c. Cardarilli, L. Di Nunzio, R. Fazzolari, M. Re

Abstract:

Many cryptography algorithm contain a lots of data Bit Manipulation operations. Unfortunately, the Instruction Set Architecure (ISA) of general purpose microprocessors is usually word oriented. Consequently the execution of this kind of algorithms is not optimized and the computation of data represented by single Bits or sub-words can require several clock cycles. Reconfigurable hardware accelerators oriented to the Bit Manipulation could accelerate the computation of these algorithms increasing the microprocessor performance in terms of execution time. This work presents the experimental results of the speed-up factor obtained for the implementation of TDES (Triple Data Encryption Standard) algorithm when a Reconfigurable Functional Unit ADAPTO [1] is integrated with a RISC microprocessor (the Altera NIOS-II soft processor [2]). The ADAPTO unit, described in VHDL (VHSIC Hardware Description Language), has been implemented on an Altera-Stratix II FPGA and integrated with the Nios soft processor using the Custom Logic feature [4]. The objective is the measurement of the speed-up factor related to the introduction of the reconfigurable hardware accelerator.

15 days free trial to Access Article
A Reconfigurable Functional Unit for Modular Operations

Lecture Notes in Electrical Engineering, 2014

Co-Authors: G.c. Cardarilli, L. Di Nunzio, R. Fazzolari, S. Pontarelli, M. Re

Abstract:

The efficiency of standard microprocessors decreases when operations on short data are performed because they are optimized to perform operations on fixed size data. Short data processing and Bit Manipulation can be accelerated integrating a Reconfigurable Functional Unit (RFU ) in parallel with the ALU. An RFU is a tightly coupled integrated Reconfigurable Array used to speed-up the computation of a set of operations for which standard microprocessors are not optimized. In this paper we show the benefit of using the Adder-based Dynamic Architecture for Processing Tailored Operators (ADAPTO RFU) [1, 2, 3] (a full adder based RFU) on modular operations. In particular we describe how to speed up the modular addition and the Montgomery Multiplication by using the ADAPTO RFU.

15 days free trial to Access Article
ACSCC - Integration of butterfly and inverse butterfly nets in embedded processors: Effects on power saving

2012 Conference Record of the Forty Sixth Asilomar Conference on Signals Systems and Computers (ASILOMAR), 2012

Co-Authors: G.c. Cardarilli, L. Di Nunzio, R. Fazzolari, M. Re

Abstract:

Many software functions are not efficiently executed by standard microprocessors. This happens when the operation granularity and data wordlength are different with respect to those of the microprocessor's architecture. Important improvements in speed and power can be obtained by integrating hardware accelerators in standard microprocessor architectures. This work, based on [1], shows that the integration of a Bit Manipulation Unit (BMU) [2] in an Altera NIOS-2 soft processor architecture [3] allows very interesting speed-up and power saving factors.

15 days free trial to Access Article
Algorithm acceleration on LEON-2 processor using a reconfigurable Bit Manipulation unit

2010 8th Workshop on Intelligent Solutions in Embedded Systems, 2010

Co-Authors: G.c. Cardarilli, L. Di Nunzio, R. Fazzolari, M. Re

Abstract:

Advanced Bit Manipulation operations are not efficiently supported by standard microprocessors since they are optimized for fixed data size operations. In literature several hardware solutions are proposed to overcome this problem, and. In this work we present the experimental results of a new architecture based on LEON-2 and a simplified version of ADAPTO (Adder-based Dynamic Architecture for Processing Tailored Operators), acting as a co-processor. For our experiments we run a set of Bit Manipulation Algorithms on the LEON-2 processor in presence and absence of the ADAPTO unit. This permits to measure the speed-up factor obtained using the proposed reconfigurable co-processor.

15 days free trial to Access Article
WISES - Algorithm acceleration on LEON-2 processor using a reconfigurable Bit Manipulation unit

2010 8th Workshop on Intelligent Solutions in Embedded Systems, 2010

Co-Authors: G.c. Cardarilli, L. Di Nunzio, R. Fazzolari, M. Re

Abstract:

Advanced Bit Manipulation operations are not efficiently supported by standard microprocessors since they are optimized for fixed data size operations. In literature several hardware solutions are proposed to overcome this problem [1], [3] and [4]. In this work we present the experimental results of a new architecture based on LEON-2 and a simplified version of ADAPTO [1] (Adder-based Dynamic Architecture for Processing Tailored Operators), acting as a co-processor. For our experiments we run a set of Bit Manipulation Algorithms on the LEON-2 processor in presence and absence of the ADAPTO unit. This permits to measure the speed-up factor obtained using the proposed reconfigurable co-processor.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Bit Manipulation with ideXlab!

Yedidya Hilewitz - One of the best experts on this subject based on the ideXlab platform.

A New Basis for Shifters in General-Purpose Processors for Existing and Advanced Bit Manipulations

Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors

Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors

IEEE Symposium on Computer Arithmetic - Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors

ASAP - Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions

Myung Hoon Sunwoo - One of the best experts on this subject based on the ideXlab platform.

Bit Manipulation Accelerator for Communication Systems Digital Signal Processor

Novel Bit Manipulation unit for communication digital signal processors

ISCAS (2) - Novel Bit Manipulation unit for communication digital signal processors

Design of Bit Manipulation accelerator for communication DSP

Akash Kumar - One of the best experts on this subject based on the ideXlab platform.

FPL - Improving autonomous soft-error tolerance of FPGA through LUT configuration Bit Manipulation

Improving autonomous soft-error tolerance of FPGA through LUT configuration Bit Manipulation

Chris Johnson - One of the best experts on this subject based on the ideXlab platform.

a game driven approach to teaching Bit Manipulation abstract only

SIGCSE - A Game-Driven Approach to Teaching Bit Manipulation (Abstract Only)

M. Re - One of the best experts on this subject based on the ideXlab platform.

ICECS - TDES cryptography algorithm acceleration using a reconfigurable functional unit

A Reconfigurable Functional Unit for Modular Operations

ACSCC - Integration of butterfly and inverse butterfly nets in embedded processors: Effects on power saving

Algorithm acceleration on LEON-2 processor using a reconfigurable Bit Manipulation unit

WISES - Algorithm acceleration on LEON-2 processor using a reconfigurable Bit Manipulation unit