Address Register

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 234 Experts worldwide ranked by ideXlab platform

J. Ramanujam - One of the best experts on this subject based on the ideXlab platform.

  • Memory optimization techniques for embedded systems
    2020
    Co-Authors: Jinpyo Hong, J. Ramanujam
    Abstract:

    Embedded systems have become ubiquitous and as a result optimization of the design and performance of programs that run on these systems have continued to remain as significant challenges to the computer systems research community. This dissertation Addresses several key problems in the optimization of programs for embedded systems which include digital signal processors as the core processor. Chapter 2 develops an efficient and effective algorithm to construct a worm partition graph by finding a longest worm at the moment and maintaining the legality of scheduling. Proper assignment of offsets to variables in embedded DSPs plays a key role in determining the execution time and amount of program memory needed. Chapter 3 proposes a new approach of introducing a weight adjustment function and showed that its experimental results are slightly better and at least as well as the results of the previous works. Our solutions Address several problems such as handling fragmented paths resulting from graph-based solutions, dealing with modify Registers, and the effective utilization of multiple Address Registers. In addition to offset assignment, Address Register allocation is important for embedded DSPs. Chapter 4 develops a lower bound and an algorithm that can eliminate the explicit use of Address Register instructions in loops with array references. Scheduling of computations and the associated memory requirement are closely interrelated for loop computations. In Chapter 5, we develop a general framework for studying the trade-off between scheduling and storage requirements in nested loops that access multi-dimensional arrays. Tiling has long been used to improve the memory performance of loops. Only a sufficient condition for the legality of tiling was known previously. While it was conjectured that the sufficient condition would also become necessary for “large enough” tiles, there had been no precise characterization of what is “large enough.” Chapter 6 develops a new framework for characterizing tiling by viewing tiles as points on a lattice. This also leads to the development of conditions under the legality condition for tiling is both necessary and sufficient.

  • CODE SIZE REDUCTION FOR ARRAY INTENSIVE APPLICATIONS ON DIGITAL SIGNAL PROCESSORS
    Journal of Circuits Systems and Computers, 2012
    Co-Authors: Hassan Salamy, J. Ramanujam
    Abstract:

    Optimizing the code size for applications that run on a digital signal processors (DSPs) is a crucial step in generating high-quality and efficient code. Most modern DSP provide multiple Address Registers and dedicated Address generation units that provide Address generation in parallel to instruction execution. There is no Address computation overhead if the next Address is within the auto-modify range of the Address Register. Many DSP algorithms have an iterative pattern of references to array elements within loops. Thus, a careful assignment of array references to Address Registers (called the Address Register allocation or ARA problem) reduces the number of explicit Address arithmetic instructions as well as the execution cycles. In this paper, we present an optimal integer linear programming formulation for the Address Register allocation problem which incorporates code restructuring techniques. In addition, we have developed a Genetic Algorithm solution for the ARA problem that allows us to get near-optimal solutions in a reasonable amount of time for large embedded applications. Results on several benchmarks show the effectiveness of our techniques compared to other techniques in the literature.

  • Storage Optimization through Offset Assignment with Variable Coalescing
    ACM Transactions in Embedded Computing Systems, 2012
    Co-Authors: Hassan Salamy, J. Ramanujam
    Abstract:

    Most modern digital signal processors (DSPs) provide multiple Address Registers and a dedicated Address generation unit (AGU) which performs Address generation in parallel to instruction execution. There is no Address computation overhead if the next Address is within the auto-modify range. A careful placement of variables in memory is utilized to decrease the number of Address arithmetic instructions and thus to generate compact and efficient code. The simple offset assignment (SOA) problem concerns the layout of variables for machines with one Address Register and the general offset assignment (GOA) deals with multiple Address Registers. Both these problems assume that each variable needs to be allocated for the entire duration of a program. Both SOA and GOA are NP-complete. In this article, we present effective heuristics for the simple and the general offset assignment problems with variable coalescing where two or more non-interfering variables can be mapped into the same memory location. Results on several benchmarks show the significant improvement of our proposed heuristics compared to other heuristics in the literature.

  • An ILP solution to Address code generation for embedded applications on digital signal processors
    ACM Transactions on Design Automation of Electronic Systems, 2012
    Co-Authors: Hassan Salamy, J. Ramanujam
    Abstract:

    Digital Signal Processors (DSPs) are a family of embedded processors designed under tight memory, area, and cost constraints. Many DSPs use irregular Addressing modes where base-plus-offset mode is not supported. However, they often have Address Generation Units (AGUs) that can perform auto-increment/decrement Address arithmetic instructions in parallel with Load/Store instructions. This feature can be utilized to reduce the number of explicit Address arithmetic instructions and thus reduce the embedded application code size. This code size reduction is essential for this family of DSP as the code usually resides in the ROM and hence the code size directly translates into silicon area. An effective technique for optimized code generation is offset assignment. This is a well-used technique in the literature to decrease the code size by finding an offset assignment that can effectively utilize auto-increment/decrement. This problem is known as simple offset assignment when there is only one Address Register and as General Offset Assignment (GOA) for multiple available Address Registers. In this article, we present an optimal Integer Linear Programming (ILP) solution to the offset assignment problem with variable coalescing where more than one variable can share the same memory location. Variable permutation is also formulated to find the best access sequence to achieve the best offset assignment that decreases the code size the most. Experimental results on several benchmarks show the effectiveness of our variable permutation technique as well as the large improvement from the ILP-based solutions compared to heuristics.

  • A dynamic heuristic algorithm for offset assignment
    2010 International Computer Symposium (ICS2010), 2010
    Co-Authors: Tong-chai Wang, J. Ramanujam
    Abstract:

    Optimization of the program to be stored in ROM in the microprocessor is an important issue in compiling for embedded processors like digital signal processors (DSPs). Offset assignment (OA) is a highly effective Address code optimization technique for embedded processors with limited memory. The simple offset assignment (SOA) problem concerns the layout of variables for processors with only one Address Register and the general offset assignment (GOA) problem deal with multiple Address Registers. This paper concentrates on SOA based on specialized DSPs with Address Generation Units (AGUs). A number of SOA algorithms had been proposed to solve the SOA problem in the past years. In this paper, a new heuristic has been proposed for SOA which dynamic select the edge by re-sorting the edge array iteratively. A new technique for edge selection approach has been proposed to reduce the OA cost in advance. The experimental results on several benchmarks show our approach not only better than the previous works but also can be applied onto the other OA algorithms to have a significant improvement.

M Kandemir - One of the best experts on this subject based on the ideXlab platform.

  • reducing code size through Address Register assignment
    ACM Transactions in Embedded Computing Systems, 2006
    Co-Authors: G Chen, M Kandemir, M J Irwin, J. Ramanujam
    Abstract:

    In DSP processors, minimizing the amount of Address calculations is critical for reducing code size and improving performance, since studies of programs have shown that instructions that manipulate Address Registers constitute a significant portion of the overall instruction count (up to 55p). This work presents a compiler-based optimization strategy to “reduce the code size in embedded systems.” Our strategy maximizes the use of indirect Addressing modes with postincrement/decrement capabilities available in DSP processors. These modes can be exploited by ensuring that successive references to variables access consecutive memory locations. To achieve this spatial locality, our approach uses both access pattern modification (program code restructuring) and memory storage reordering (data layout restructuring). Experimental results on a set of benchmark codes show the effectiveness of our solution and indicate that our approach outperforms the previous approaches to the problem. In addition to resulting in significant reductions in instruction memory (storage) requirements, the proposed technique improves execution time.

  • Optimizing Address code generation for array-intensive DSP applications
    International Symposium on Code Generation and Optimization, 2005
    Co-Authors: Guilin Chen, M Kandemir
    Abstract:

    The application code size is a critical design factor for many embedded systems. Unfortunately, most available compilers optimize primarily for speed of execution rather than code density. As a result, the compiler-generated code can be much larger than necessary. In particular, in the DSP domain, the past research found that optimizing Address code generation can be very important since Address code can account for over 50% of all program bits. This paper presents a compiler-directed scheme to minimize the number of instructions to be generated to manipulate Address Registers found in DSP architectures. As opposed to most of the prior techniques that attempt to reduce the number of such instructions through careful Address Register assignment, this paper proposes modifying loop access patterns in array-intensive signal processing applications. In addition, it demonstrates how the proposed scheme can cooperate with a data layout optimizer for increasing its benefits further. We also discuss how optimizations that target effective Address code generation can conflict with data locality-enhancing transformations. We evaluate the proposed approach using twelve array-intensive embedded applications. Our experimental results indicate that the proposed approach not only leads to significant reductions in code size but also outperforms prior efforts on reducing code size of array-intensive DSP applications.

  • Address Register assignment for reducing code size
    Compiler Construction, 2003
    Co-Authors: M Kandemir, G Chen, M J Irwin, J. Ramanujam
    Abstract:

    In DSP processors, minimizing the amount of Address calculations is critical for reducing code size and improving performance since studies of programs have shown that instructions that manipulate Address Registers constitute a significant portion of the overall instruction count (up to 55&percnt). This work presents a compiler-based optimization strategy to reduce the code size in embedded systems. Our strategy maximizes the use of indirect Addressing modes with post-increment and post-decrement capabilities available in DSP processors. These modes can be exploited by ensuring that successive references to variables access consecutive memory locations. To achieve this spatial locality, our approach uses both access pattern modification (program code restructuring) and memory storage reordering (data layout restructuring).

Hassan Salamy - One of the best experts on this subject based on the ideXlab platform.

  • Minimizing Address arithmetic instructions in embedded applications on DSPs
    Computers & Electrical Engineering, 2012
    Co-Authors: Hassan Salamy
    Abstract:

    Address arithmetic instructions constitute a big part of the generated code for digital signal processors (DSPs). Most modern digital signal processors (DSPs) provide multiple Address Registers and a dedicated Address generation unit (AGU) which performs Address generation in parallel to instruction execution. There is no Address computation overhead if the next Address is within the auto-modify range. A careful placement of variables in memory is utilized to reduce the number of Address arithmetic instructions and thus generate compact and efficient code. The simple offset assignment (SOA) problem concerns the layout of variables for machines with one Address Register and the general offset assignment (GOA) deals with multiple Address Registers. Both these problems assume that each variable needs to be allocated for the entire duration of a program. Both SOA and GOA are NP-complete. In this article, we present effective solutions using simulated annealing (SA) for the simple and the general offset assignment problems with variable coalescing where two or more non-interfering variables can be mapped into the same memory location. Results on several benchmarks show the significant improvement from our proposed techniques compared to other heuristics in the literature.

  • CODE SIZE REDUCTION FOR ARRAY INTENSIVE APPLICATIONS ON DIGITAL SIGNAL PROCESSORS
    Journal of Circuits Systems and Computers, 2012
    Co-Authors: Hassan Salamy, J. Ramanujam
    Abstract:

    Optimizing the code size for applications that run on a digital signal processors (DSPs) is a crucial step in generating high-quality and efficient code. Most modern DSP provide multiple Address Registers and dedicated Address generation units that provide Address generation in parallel to instruction execution. There is no Address computation overhead if the next Address is within the auto-modify range of the Address Register. Many DSP algorithms have an iterative pattern of references to array elements within loops. Thus, a careful assignment of array references to Address Registers (called the Address Register allocation or ARA problem) reduces the number of explicit Address arithmetic instructions as well as the execution cycles. In this paper, we present an optimal integer linear programming formulation for the Address Register allocation problem which incorporates code restructuring techniques. In addition, we have developed a Genetic Algorithm solution for the ARA problem that allows us to get near-optimal solutions in a reasonable amount of time for large embedded applications. Results on several benchmarks show the effectiveness of our techniques compared to other techniques in the literature.

  • Storage Optimization through Offset Assignment with Variable Coalescing
    ACM Transactions in Embedded Computing Systems, 2012
    Co-Authors: Hassan Salamy, J. Ramanujam
    Abstract:

    Most modern digital signal processors (DSPs) provide multiple Address Registers and a dedicated Address generation unit (AGU) which performs Address generation in parallel to instruction execution. There is no Address computation overhead if the next Address is within the auto-modify range. A careful placement of variables in memory is utilized to decrease the number of Address arithmetic instructions and thus to generate compact and efficient code. The simple offset assignment (SOA) problem concerns the layout of variables for machines with one Address Register and the general offset assignment (GOA) deals with multiple Address Registers. Both these problems assume that each variable needs to be allocated for the entire duration of a program. Both SOA and GOA are NP-complete. In this article, we present effective heuristics for the simple and the general offset assignment problems with variable coalescing where two or more non-interfering variables can be mapped into the same memory location. Results on several benchmarks show the significant improvement of our proposed heuristics compared to other heuristics in the literature.

  • An ILP solution to Address code generation for embedded applications on digital signal processors
    ACM Transactions on Design Automation of Electronic Systems, 2012
    Co-Authors: Hassan Salamy, J. Ramanujam
    Abstract:

    Digital Signal Processors (DSPs) are a family of embedded processors designed under tight memory, area, and cost constraints. Many DSPs use irregular Addressing modes where base-plus-offset mode is not supported. However, they often have Address Generation Units (AGUs) that can perform auto-increment/decrement Address arithmetic instructions in parallel with Load/Store instructions. This feature can be utilized to reduce the number of explicit Address arithmetic instructions and thus reduce the embedded application code size. This code size reduction is essential for this family of DSP as the code usually resides in the ROM and hence the code size directly translates into silicon area. An effective technique for optimized code generation is offset assignment. This is a well-used technique in the literature to decrease the code size by finding an offset assignment that can effectively utilize auto-increment/decrement. This problem is known as simple offset assignment when there is only one Address Register and as General Offset Assignment (GOA) for multiple available Address Registers. In this article, we present an optimal Integer Linear Programming (ILP) solution to the offset assignment problem with variable coalescing where more than one variable can share the same memory location. Variable permutation is also formulated to find the best access sequence to achieve the best offset assignment that decreases the code size the most. Experimental results on several benchmarks show the effectiveness of our variable permutation technique as well as the large improvement from the ILP-based solutions compared to heuristics.

  • ESTImedia - Optimal Address Register allocation for arrays in DSP applications
    2008 IEEE ACM IFIP Workshop on Embedded Systems for Real-Time Multimedia, 2008
    Co-Authors: Hassan Salamy, J. Ramanujam
    Abstract:

    Optimizing the code size of a digital signal processing application is a crucial step in generating high quality and efficient code for embedded systems. Most modern digital signal processors (DSPs) provide multiple Address Registers and a dedicated Address generation unit (AGU) that provides Address generation in parallel to instruction execution. There is no Address computation overhead if the next Address is within the auto-modify range. Many DSP algorithms have an iterative pattern of references to array elements within loops. Thus, a careful assignment of array references to Address Registers reduces the number of explicit Address Register instructions as well as the execution cycles. In this paper, we present an optimal integer linear programming (ILP) formulation for the Address Register allocation problem (ARA) with code reconstructing techniques. Genetic algorithm is also used to solve the ARA problem to get a near-optimal solution in a reasonable amount of time for big embedded applications. Results on several benchmarks show the effectiveness of our techniques compared to other techniques in the literature.

Guilin Chen - One of the best experts on this subject based on the ideXlab platform.

  • Optimizing Address code generation for array-intensive DSP applications
    International Symposium on Code Generation and Optimization, 2005
    Co-Authors: Guilin Chen, M Kandemir
    Abstract:

    The application code size is a critical design factor for many embedded systems. Unfortunately, most available compilers optimize primarily for speed of execution rather than code density. As a result, the compiler-generated code can be much larger than necessary. In particular, in the DSP domain, the past research found that optimizing Address code generation can be very important since Address code can account for over 50% of all program bits. This paper presents a compiler-directed scheme to minimize the number of instructions to be generated to manipulate Address Registers found in DSP architectures. As opposed to most of the prior techniques that attempt to reduce the number of such instructions through careful Address Register assignment, this paper proposes modifying loop access patterns in array-intensive signal processing applications. In addition, it demonstrates how the proposed scheme can cooperate with a data layout optimizer for increasing its benefits further. We also discuss how optimizations that target effective Address code generation can conflict with data locality-enhancing transformations. We evaluate the proposed approach using twelve array-intensive embedded applications. Our experimental results indicate that the proposed approach not only leads to significant reductions in code size but also outperforms prior efforts on reducing code size of array-intensive DSP applications.

  • CGO - Optimizing Address Code Generation for Array-Intensive DSP Applications
    International Symposium on Code Generation and Optimization, 2005
    Co-Authors: Guilin Chen, Mahmut Kandemir
    Abstract:

    The application code size is a critical design factor for many embedded systems. Unfortunately, most available compilers optimize primarily for speed of execution rather than code density. As a result, the compiler-generated code can be much larger than necessary. In particular, in the DSP domain, the past research found that optimizing Address code generation can be very important since Address code can account for over 50% of all program bits. This paper presents a compiler-directed scheme to minimize the number of instructions to be generated to manipulate Address Registers found in DSP architectures. As opposed to most of the prior techniques that attempt to reduce the number of such instructions through careful Address Register assignment, this paper proposes modifying loop access patterns in array-intensive signal processing applications. In addition, it demonstrates how the proposed scheme can cooperate with a data layout optimizer for increasing its benefits further. We also discuss how optimizations that target effective Address code generation can conflict with data locality-enhancing transformations. We evaluate the proposed approach using twelve array-intensive embedded applications. Our experimental results indicate that the proposed approach not only leads to significant reductions in code size but also outperforms prior efforts on reducing code size of array-intensive DSP applications.

Jean-luc Danger - One of the best experts on this subject based on the ideXlab platform.

  • From cryptography to hardware: analyzing and protecting embedded Xilinx BRAM for cryptographic applications
    Journal of Cryptographic Engineering, 2013
    Co-Authors: Shivam Bhasin, Annelie Heuser, Sylvain Guilley, Jean-luc Danger
    Abstract:

    The design of cryptographic applications needs special care. For instance, physical attacks like side-channel analysis (SCA) are able to recover the secret key, just by observing the activity of the computation, even for mathematically robust algorithms like AES. SCA considers the “leakage” of a well chosen intermediate variable correlated with the secret. Field programmable gate-arrays (FPGA) are often used for hardware implementations for low to medium volume productions or when flexibility is needed. They offer many possibilities for the computation, like small look-up tables (LUT) and embedded block memories (BRAM). Certain countermeasures can be deployed, like dual-rail logic or masking, to resist SCA on FPGA. However to design an effective countermeasure, it is of prime importance for a designer to know the main leakage sources of the device. In this paper, we analyze the leakage source of a Xilinx Virtex V FPGA by studying three different AES architectures. The analysis is based on real measurements by using specific leakage models of the sensitive variable, adapted to each architecture. Our results demonstrate that, BRAM which were considered to leak less traditionally, are found to be equally vulnerable if we change the attack target from Address Register to output latch. We also show that if the leakage model is known, simple countermeasures with only 16 % overhead can be deployed to overcome the leakage.

  • From Cryptography to Hardware: Analyzing Embedded Xilinx BRAM for Cryptographic Applications
    2012 45th Annual IEEE ACM International Symposium on Microarchitecture Workshops, 2012
    Co-Authors: Shivam Bhasin, Sylvain Guilley, Jean-luc Danger
    Abstract:

    Design of cryptographic applications need special care. For instance, physical attacks like Side-Channel Analysis (SCA) are able to recover the secret key, just by observing the activity of the computation, even for mathematically robust algorithms like AES. SCA considers the "leakage" of a well chosen intermediate variable correlated with the secret. Field programmable gate-arrays (FPGA) are often used for hardware implementations for low to medium volume productions or when flexibility is needed. They offer many possibilities for the computation, like small Look-Up Tables (LUT) and embedded block memories (BRAM). Certain countermeasures can be deployed, like dual-rail logic or masking, to resist SCA on FPGA. However to design an effective countermeasure, it is of prime importance for a designer to know the main leakage sources of the device. In this article, we analyze the leakage source of a Xilinx Virtex V FPGA by studying 3 different AES architectures. The analysis is based on real measurements by using specific leakage models of the sensitive variable, adapted to each architecture. Our results demonstrate that, BRAM which were considered to leak less traditionally, are found to be equally vulnerable if we change the attack target from Address Register to output latch. Hence by providing important clues about the leakage, this study allows the designers to enhance the robustness of their implementation in FPGA.