Address Offset - Explore the Science & Experts | ideXlab

Scan Science and Technology

Contact Leading Edge Experts & Companies

Address Offset

The Experts below are selected from a list of 8823 Experts worldwide ranked by ideXlab platform

Address Offset – Free Register to Access Experts & Abstracts

Edwin H M Sha – One of the best experts on this subject based on the ideXlab platform.

  • Optimized Address Assignment With Array and Loop Transformations for Minimizing Schedule Length
    IEEE Transactions on Circuits and Systems I: Regular Papers, 2008
    Co-Authors: Chun Jason Xue, Zili Shao, Zhiping Jia, Meng Wang, Edwin H M Sha

    Abstract:

    Reducing Address arithmetic operations by optimization of Address Offset assignment greatly improves the performance of digital signal processor (DSP) applications. However, minimizing Address operations alone may not directly reduce code size and schedule length for DSPs with multiple functional units. Little research work has been conducted on loop optimization with Address Offset assignment problem for architectures with multiple functional units. In this paper, we combine loop scheduling, array interleaving, and Address assignment to minimize the schedule length and the number of Address operations for loops on DSP architectures with multiple functional units. Array interleaving is applied to optimize Address assignment for arrays in loop scheduling process. An algorithm, Address operation reduction rotation scheduling (AORRS), is proposed. The algorithm minimizes both schedule length and the number of Address operations. with to list scheduling, AORRS shows an average reduction of 38.4% in schedule length and an average reduction of 31.7% in the number of Address operations. Compared with rotation scheduling, AORRS shows an average reduction of 15.9% in schedule length and 33.6% in the number of Address operations.

  • ICASSP (5) – Optimizing DSP scheduling via Address assignment with array and loop transformation
    Proceedings. (ICASSP '05). IEEE International Conference on Acoustics Speech and Signal Processing 2005., 1
    Co-Authors: Chun Xue, Zili Shao, Ying Chen, Edwin H M Sha

    Abstract:

    Reducing Address arithmetic instructions by optimization of Address Offset assignment greatly improves the performance of DSP applications. However, minimizing Address operations alone may not directly reduce code size and schedule length for multiple functional units DSPs. In this paper, we exploit Address assignment and scheduling for application with loops on multiple functional unit DSPs. Array transformation is used in our approach to leverage the indirect Addressing modes provided by most of the DSP architectures. An algorithm, Address instruction reduction loop scheduling (AIRLS), is proposed. The algorithm utilizes the techniques of rotation scheduling, Address assignment and array transformation to minimize both Address instructions and schedule length. Compared to the list scheduling, AIRLS shows an average reduction of 35.4% in schedule length and an average reduction of 38.3% in Address instructions. Compared to the rotation scheduling, AIRLS shows an average reduction of 19.2% in schedule length and 39.5% in the number of Address instructions.

Huiyang Zhou – One of the best experts on this subject based on the ideXlab platform.

  • a gpgpu compiler for memory optimization and parallelism management
    Programming Language Design and Implementation, 2010
    Co-Authors: Yi Yang, Ping Xiang, Jingfei Kong, Huiyang Zhou

    Abstract:

    This paper presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It Addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and judicious management of parallelism. The input to our compiler is a naive GPU kernel function, which is functionally correct but without any consideration for performance optimization. The compiler analyzes the code, identifies its memory access patterns, and generates both the optimized kernel and the kernel invocation parameters. Our optimization process includes vectorization and memory coalescing for memory bandwidth enhancement, tiling and unrolling for data reuse and parallelism management, and thread block remapping or AddressOffset insertion for partition-camping elimination. The experiments on a set of scientific and media processing algorithms show that our optimized code achieves very high performance, either superior or very close to the highly fine-tuned library, NVIDIA CUBLAS 2.2, and up to 128 times speedups over the naive versions. Another distinguishing feature of our compiler is the understandability of the optimized code, which is useful for performance analysis and algorithm refinement.

  • PLDI – A GPGPU compiler for memory optimization and parallelism management
    Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation – PLDI '10, 2010
    Co-Authors: Yi Yang, Ping Xiang, Jingfei Kong, Huiyang Zhou

    Abstract:

    This paper presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It Addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and judicious management of parallelism. The input to our compiler is a naive GPU kernel function, which is functionally correct but without any consideration for performance optimization. The compiler analyzes the code, identifies its memory access patterns, and generates both the optimized kernel and the kernel invocation parameters. Our optimization process includes vectorization and memory coalescing for memory bandwidth enhancement, tiling and unrolling for data reuse and parallelism management, and thread block remapping or AddressOffset insertion for partition-camping elimination. The experiments on a set of scientific and media processing algorithms show that our optimized code achieves very high performance, either superior or very close to the highly fine-tuned library, NVIDIA CUBLAS 2.2, and up to 128 times speedups over the naive versions. Another distinguishing feature of our compiler is the understandability of the optimized code, which is useful for performance analysis and algorithm refinement.

Zili Shao – One of the best experts on this subject based on the ideXlab platform.

  • Optimized Address Assignment With Array and Loop Transformations for Minimizing Schedule Length
    IEEE Transactions on Circuits and Systems I: Regular Papers, 2008
    Co-Authors: Chun Jason Xue, Zili Shao, Zhiping Jia, Meng Wang, Edwin H M Sha

    Abstract:

    Reducing Address arithmetic operations by optimization of Address Offset assignment greatly improves the performance of digital signal processor (DSP) applications. However, minimizing Address operations alone may not directly reduce code size and schedule length for DSPs with multiple functional units. Little research work has been conducted on loop optimization with Address Offset assignment problem for architectures with multiple functional units. In this paper, we combine loop scheduling, array interleaving, and Address assignment to minimize the schedule length and the number of Address operations for loops on DSP architectures with multiple functional units. Array interleaving is applied to optimize Address assignment for arrays in loop scheduling process. An algorithm, Address operation reduction rotation scheduling (AORRS), is proposed. The algorithm minimizes both schedule length and the number of Address operations. with to list scheduling, AORRS shows an average reduction of 38.4% in schedule length and an average reduction of 31.7% in the number of Address operations. Compared with rotation scheduling, AORRS shows an average reduction of 15.9% in schedule length and 33.6% in the number of Address operations.

  • ICASSP (5) – Optimizing DSP scheduling via Address assignment with array and loop transformation
    Proceedings. (ICASSP '05). IEEE International Conference on Acoustics Speech and Signal Processing 2005., 1
    Co-Authors: Chun Xue, Zili Shao, Ying Chen, Edwin H M Sha

    Abstract:

    Reducing Address arithmetic instructions by optimization of Address Offset assignment greatly improves the performance of DSP applications. However, minimizing Address operations alone may not directly reduce code size and schedule length for multiple functional units DSPs. In this paper, we exploit Address assignment and scheduling for application with loops on multiple functional unit DSPs. Array transformation is used in our approach to leverage the indirect Addressing modes provided by most of the DSP architectures. An algorithm, Address instruction reduction loop scheduling (AIRLS), is proposed. The algorithm utilizes the techniques of rotation scheduling, Address assignment and array transformation to minimize both Address instructions and schedule length. Compared to the list scheduling, AIRLS shows an average reduction of 35.4% in schedule length and an average reduction of 38.3% in Address instructions. Compared to the rotation scheduling, AIRLS shows an average reduction of 19.2% in schedule length and 39.5% in the number of Address instructions.