Address Offset - Explore the Science & Experts

The Experts below are selected from a list of 8823 Experts worldwide ranked by ideXlab platform

Edwin H M Sha - One of the best experts on this subject based on the ideXlab platform.

Optimized Address Assignment With Array and Loop Transformations for Minimizing Schedule Length

IEEE Transactions on Circuits and Systems I: Regular Papers, 2008

Co-Authors: Chun Jason Xue, Zili Shao, Zhiping Jia, Meng Wang, Edwin H M Sha

Abstract:

Reducing Address arithmetic operations by optimization of Address Offset assignment greatly improves the performance of digital signal processor (DSP) applications. However, minimizing Address operations alone may not directly reduce code size and schedule length for DSPs with multiple functional units. Little research work has been conducted on loop optimization with Address Offset assignment problem for architectures with multiple functional units. In this paper, we combine loop scheduling, array interleaving, and Address assignment to minimize the schedule length and the number of Address operations for loops on DSP architectures with multiple functional units. Array interleaving is applied to optimize Address assignment for arrays in loop scheduling process. An algorithm, Address operation reduction rotation scheduling (AORRS), is proposed. The algorithm minimizes both schedule length and the number of Address operations. with to list scheduling, AORRS shows an average reduction of 38.4% in schedule length and an average reduction of 31.7% in the number of Address operations. Compared with rotation scheduling, AORRS shows an average reduction of 15.9% in schedule length and 33.6% in the number of Address operations.

15 days free trial to Access Article
ICASSP (5) - Optimizing DSP scheduling via Address assignment with array and loop transformation

Proceedings. (ICASSP '05). IEEE International Conference on Acoustics Speech and Signal Processing 2005., 1

Co-Authors: Chun Xue, Zili Shao, Ying Chen, Edwin H M Sha

Abstract:

Reducing Address arithmetic instructions by optimization of Address Offset assignment greatly improves the performance of DSP applications. However, minimizing Address operations alone may not directly reduce code size and schedule length for multiple functional units DSPs. In this paper, we exploit Address assignment and scheduling for application with loops on multiple functional unit DSPs. Array transformation is used in our approach to leverage the indirect Addressing modes provided by most of the DSP architectures. An algorithm, Address instruction reduction loop scheduling (AIRLS), is proposed. The algorithm utilizes the techniques of rotation scheduling, Address assignment and array transformation to minimize both Address instructions and schedule length. Compared to the list scheduling, AIRLS shows an average reduction of 35.4% in schedule length and an average reduction of 38.3% in Address instructions. Compared to the rotation scheduling, AIRLS shows an average reduction of 19.2% in schedule length and 39.5% in the number of Address instructions.

15 days free trial to Access Article

Huiyang Zhou - One of the best experts on this subject based on the ideXlab platform.

a gpgpu compiler for memory optimization and parallelism management

Programming Language Design and Implementation, 2010

Co-Authors: Yi Yang, Jingfei Kong, Ping Xiang, Huiyang Zhou

Abstract:

This paper presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It Addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and judicious management of parallelism. The input to our compiler is a naive GPU kernel function, which is functionally correct but without any consideration for performance optimization. The compiler analyzes the code, identifies its memory access patterns, and generates both the optimized kernel and the kernel invocation parameters. Our optimization process includes vectorization and memory coalescing for memory bandwidth enhancement, tiling and unrolling for data reuse and parallelism management, and thread block remapping or Address-Offset insertion for partition-camping elimination. The experiments on a set of scientific and media processing algorithms show that our optimized code achieves very high performance, either superior or very close to the highly fine-tuned library, NVIDIA CUBLAS 2.2, and up to 128 times speedups over the naive versions. Another distinguishing feature of our compiler is the understandability of the optimized code, which is useful for performance analysis and algorithm refinement.

15 days free trial to Access Article
PLDI - A GPGPU compiler for memory optimization and parallelism management

Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation - PLDI '10, 2010

Co-Authors: Yi Yang, Jingfei Kong, Ping Xiang, Huiyang Zhou

Abstract:

This paper presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It Addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and judicious management of parallelism. The input to our compiler is a naive GPU kernel function, which is functionally correct but without any consideration for performance optimization. The compiler analyzes the code, identifies its memory access patterns, and generates both the optimized kernel and the kernel invocation parameters. Our optimization process includes vectorization and memory coalescing for memory bandwidth enhancement, tiling and unrolling for data reuse and parallelism management, and thread block remapping or Address-Offset insertion for partition-camping elimination. The experiments on a set of scientific and media processing algorithms show that our optimized code achieves very high performance, either superior or very close to the highly fine-tuned library, NVIDIA CUBLAS 2.2, and up to 128 times speedups over the naive versions. Another distinguishing feature of our compiler is the understandability of the optimized code, which is useful for performance analysis and algorithm refinement.

15 days free trial to Access Article

Zili Shao - One of the best experts on this subject based on the ideXlab platform.

Optimized Address Assignment With Array and Loop Transformations for Minimizing Schedule Length

IEEE Transactions on Circuits and Systems I: Regular Papers, 2008

Co-Authors: Chun Jason Xue, Zili Shao, Zhiping Jia, Meng Wang, Edwin H M Sha

Abstract:

Reducing Address arithmetic operations by optimization of Address Offset assignment greatly improves the performance of digital signal processor (DSP) applications. However, minimizing Address operations alone may not directly reduce code size and schedule length for DSPs with multiple functional units. Little research work has been conducted on loop optimization with Address Offset assignment problem for architectures with multiple functional units. In this paper, we combine loop scheduling, array interleaving, and Address assignment to minimize the schedule length and the number of Address operations for loops on DSP architectures with multiple functional units. Array interleaving is applied to optimize Address assignment for arrays in loop scheduling process. An algorithm, Address operation reduction rotation scheduling (AORRS), is proposed. The algorithm minimizes both schedule length and the number of Address operations. with to list scheduling, AORRS shows an average reduction of 38.4% in schedule length and an average reduction of 31.7% in the number of Address operations. Compared with rotation scheduling, AORRS shows an average reduction of 15.9% in schedule length and 33.6% in the number of Address operations.

15 days free trial to Access Article
ICASSP (5) - Optimizing DSP scheduling via Address assignment with array and loop transformation

Proceedings. (ICASSP '05). IEEE International Conference on Acoustics Speech and Signal Processing 2005., 1

Co-Authors: Chun Xue, Zili Shao, Ying Chen, Edwin H M Sha

Abstract:

Reducing Address arithmetic instructions by optimization of Address Offset assignment greatly improves the performance of DSP applications. However, minimizing Address operations alone may not directly reduce code size and schedule length for multiple functional units DSPs. In this paper, we exploit Address assignment and scheduling for application with loops on multiple functional unit DSPs. Array transformation is used in our approach to leverage the indirect Addressing modes provided by most of the DSP architectures. An algorithm, Address instruction reduction loop scheduling (AIRLS), is proposed. The algorithm utilizes the techniques of rotation scheduling, Address assignment and array transformation to minimize both Address instructions and schedule length. Compared to the list scheduling, AIRLS shows an average reduction of 35.4% in schedule length and an average reduction of 38.3% in Address instructions. Compared to the rotation scheduling, AIRLS shows an average reduction of 19.2% in schedule length and 39.5% in the number of Address instructions.

15 days free trial to Access Article

Jun Yang - One of the best experts on this subject based on the ideXlab platform.

procedural level Address Offset assignment of dsp applications with loops

International Conference on Parallel Processing, 2003

Co-Authors: Youtao Zhang, Jun Yang

Abstract:

Automatic optimization of Address Offset assignment for DSP applications, which reduces the number of Address arithmetic instructions to meet the tight memory size restrictions and performance requirements, received a lot of attention in recent years. However, most of current research focuses at the basic block level and does not distinguish different program structures, especially loops. Moreover, the effectiveness of modify register (MR) is not fully exploited since it is used only in the post optimization step. A novel Address Offset assignment approach is proposed at the procedural level. The MR is effectively used in the Address assignment for loop structures. By taking advantage of MR, variables accessed in sequence within a loop are assigned to memory words of equal distances. Both static and dynamic Addressing instruction counts are greatly reduced. For DSPSTONE benchmarks and on average, 9.9%, 17.1% and 21.8% improvements are achieved over Address Offset assignment [R. Leupers et al., (1996)] together with MR optimization when there is 1, 2 and 4 Address registers respectively

15 days free trial to Access Article

Yi Yang - One of the best experts on this subject based on the ideXlab platform.

a gpgpu compiler for memory optimization and parallelism management

Programming Language Design and Implementation, 2010

Co-Authors: Yi Yang, Jingfei Kong, Ping Xiang, Huiyang Zhou

Abstract:

This paper presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It Addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and judicious management of parallelism. The input to our compiler is a naive GPU kernel function, which is functionally correct but without any consideration for performance optimization. The compiler analyzes the code, identifies its memory access patterns, and generates both the optimized kernel and the kernel invocation parameters. Our optimization process includes vectorization and memory coalescing for memory bandwidth enhancement, tiling and unrolling for data reuse and parallelism management, and thread block remapping or Address-Offset insertion for partition-camping elimination. The experiments on a set of scientific and media processing algorithms show that our optimized code achieves very high performance, either superior or very close to the highly fine-tuned library, NVIDIA CUBLAS 2.2, and up to 128 times speedups over the naive versions. Another distinguishing feature of our compiler is the understandability of the optimized code, which is useful for performance analysis and algorithm refinement.

15 days free trial to Access Article
PLDI - A GPGPU compiler for memory optimization and parallelism management

Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation - PLDI '10, 2010

Co-Authors: Yi Yang, Jingfei Kong, Ping Xiang, Huiyang Zhou

Abstract:

This paper presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It Addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and judicious management of parallelism. The input to our compiler is a naive GPU kernel function, which is functionally correct but without any consideration for performance optimization. The compiler analyzes the code, identifies its memory access patterns, and generates both the optimized kernel and the kernel invocation parameters. Our optimization process includes vectorization and memory coalescing for memory bandwidth enhancement, tiling and unrolling for data reuse and parallelism management, and thread block remapping or Address-Offset insertion for partition-camping elimination. The experiments on a set of scientific and media processing algorithms show that our optimized code achieves very high performance, either superior or very close to the highly fine-tuned library, NVIDIA CUBLAS 2.2, and up to 128 times speedups over the naive versions. Another distinguishing feature of our compiler is the understandability of the optimized code, which is useful for performance analysis and algorithm refinement.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Address Offset with ideXlab!

Edwin H M Sha - One of the best experts on this subject based on the ideXlab platform.

Optimized Address Assignment With Array and Loop Transformations for Minimizing Schedule Length

ICASSP (5) - Optimizing DSP scheduling via Address assignment with array and loop transformation

Huiyang Zhou - One of the best experts on this subject based on the ideXlab platform.

a gpgpu compiler for memory optimization and parallelism management

PLDI - A GPGPU compiler for memory optimization and parallelism management

Zili Shao - One of the best experts on this subject based on the ideXlab platform.

Optimized Address Assignment With Array and Loop Transformations for Minimizing Schedule Length

ICASSP (5) - Optimizing DSP scheduling via Address assignment with array and loop transformation

Jun Yang - One of the best experts on this subject based on the ideXlab platform.

procedural level Address Offset assignment of dsp applications with loops

Yi Yang - One of the best experts on this subject based on the ideXlab platform.

a gpgpu compiler for memory optimization and parallelism management

PLDI - A GPGPU compiler for memory optimization and parallelism management