Loop Vectorization

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 84 Experts worldwide ranked by ideXlab platform

Ayal Zaks - One of the best experts on this subject based on the ideXlab platform.

  • polyhedral model guided Loop nest auto Vectorization
    International Conference on Parallel Architectures and Compilation Techniques, 2009
    Co-Authors: Konrad Trifunovic, Dorit Nuzman, Ayal Zaks, Albert Cohen, Ira Rosen
    Abstract:

    Optimizing compilers strive to construct efficient executables by applying sequences of transformations. Additional transformations are constantly being devised, with various mutual interactions among them, thereby exacerbating the notoriously difficult phase-ordering problem --- that of deciding which transformations to apply and in which order. Fortunately, new infrastructures such as the polyhedral compilation framework host a variety of transformations, facilitating the efficient exploration and configuration of multiple transformation sequences. Many powerful optimizations, however, remain external to the polyhedral framework, with potential mutual interactions that need to be considered. In this paper we examine the interactions between Loop transformations of the polyhedral compilation framework and subsequent Vectorization optimizations targeting fine-grain SIMD data-level parallelism. Automatic Vectorization involves many low-level, target-specific considerations and transformations, which currently exclude it from being part of the polyhedral framework. In order to consider potential interactions among polyhedral Loop transformations and Vectorization, we first model the performance impact of the different Loop transformations and Vectorization strategies, and then show how this cost model can be integrated seamlessly into the polyhedral representation. This predictive modelling then facilitates efficient exploration and educated decision making on how to best apply various polyhedral Loop transformations while considering the subsequent effects of different Vectorization schemes. Our work demonstrates the feasibility and benefit of tuning the polyhedral model in the context of Vectorization. Experimental results confirm that our model has accurate predictions, providing speedups of over 2 on average over traditional innermost-Loop Vectorization on PowerPC970 and Cell-SPU SIMD platforms.

  • Polyhedral-Model Guided Loop-Nest Auto-Vectorization
    2009 18th International Conference on Parallel Architectures and Compilation Techniques, 2009
    Co-Authors: Konrad Trifunovic, Dorit Nuzman, Ayal Zaks, Albert Cohen, Ira Rosen
    Abstract:

    Optimizing compilers apply numerous interdependent optimizations, leading to the notoriously difficult phase-ordering problem - that of deciding which transformations to apply and in which order. Fortunately, new infrastructures such as the polyhedral compilation framework host a variety of transformations, facilitating the efficient exploration and configuration of multiple transformation sequences. Many powerful optimizations, however, remain external to the polyhedral framework, including Vectorization. The low-level, target-specific aspects of Vectorization for fine-grain SIMD has so far excluded it from being part of the polyhedral framework. In this paper we examine the interactions between Loop transformations of the polyhedral framework and subsequent Vectorization. We model the performance impact of the different Loop transformations and Vectorization strategies, and then show how this cost model can be integrated seamlessly into the polyhedral representation. This predictive modelling facilitates efficient exploration and educated decision making to best apply various polyhedral Loop transformations while considering the subsequent effects of different Vectorization schemes. Our work demonstrates the feasibility and benefit of tuning the polyhedral model in the context of Vectorization. Experimental results confirm that our model has accurate predictions, providing speedups of over 2.0times on average over traditional innermost-Loop Vectorization on PowerPC970 and Cell-SPU SIMD platforms.

  • PACT - Outer-Loop Vectorization: revisited for short SIMD architectures
    Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08, 2008
    Co-Authors: Dorit Nuzman, Ayal Zaks
    Abstract:

    Vectorization has been an important method of using data-level parallelism to accelerate scientific workloads on vector machines such as Cray for the past three decades. In the last decade it has also proven useful for accelerating multi-media and embedded applications on short SIMD architectures such as MMX, SSE and AltiVec. Most of the focus has been directed at innermost Loops, effectively executing their iterations concurrently as much as possible. Outer Loop Vectorization refers to vectorizing a level of a Loop nest other than the innermost, which can be beneficial if the outer Loop exhibits greater data-level parallelism and locality than the innermost Loop. Outer Loop Vectorization has traditionally been performed by interchanging an outer-Loop with the innermost Loop, followed by vectorizing it at the innermost position. A more direct unroll-and-jam approach can be used to vectorize an outer-Loop without involving Loop interchange, which can be especially suitable for short SIMD architectures. In this paper we revisit the method of outer Loop Vectorization, paying special attention to properties of modern short SIMD architectures. We show that even though current optimizing compilers for such targets do not apply outer-Loop Vectorization in general, it can provide significant performance improvements over innermost Loop Vectorization. Our implementation of direct outer-Loop Vectorization, available in GCC 4.3, achieves speedup factors of 3.13 and 2.77 on average across a set of benchmarks, compared to 1.53 and 1.39 achieved by innermost Loop Vectorization, when running on a Cell BE SPU and PowerPC970 processors respectively. Moreover, outer-Loop Vectorization provides new reuse opportunities that can be vital for such short SIMD architectures, including efficient handling of alignment. We present an optimization tapping such opportunities, capable of further boosting the performance obtained by outer-Loop Vectorization to achieve average speedup factors of 5.26 and 3.64.

  • Outer-Loop Vectorization - revisited for short SIMD architectures
    2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), 2008
    Co-Authors: Dorit Nuzman, Ayal Zaks
    Abstract:

    Vectorization has been an important method of using data-level parallelism to accelerate scientific workloads on vector machines such as Cray for the past three decades. In the last decade it has also proven useful for accelerating multimedia and embedded applications on short SIMD architectures such as MMX, SSE and AltiVec. Most of the focus has been directed at innermost Loops, effectively executing their iterations concurrently as much as possible. Outer Loop Vectorization refers to vectorizing a level of a Loop nest other than the innermost, which can be beneficial if the outer Loop exhibits greater data-level parallelism and locality than the innermost Loop. Outer Loop Vectorization has traditionally been performed by interchanging an outer-Loop with the innermost Loop, followed by vectorizing it at the innermost position. A more direct unroll-and-jam approach can be used to vectorize an outer-Loop without involving Loop interchange, which can be especially suitable for short SIMD architectures. In this paper we revisit the method of outer Loop Vectorization, paying special attention to properties of modern short SIMD architectures. We show that even though current optimizing compilers for such targets do not apply outer-Loop Vectorization in general, it can provide significant performance improvements over innermost Loop Vectorization. Our implementation of direct outer-Loop Vectorization, available in GCC 4.3, achieves speedup factors of 3.13 and 2.77 on average across a set of benchmarks, compared to 1.53 and 1.39 achieved by innermost Loop Vectorization, when running on a Cell BE SPU and PowerPC970 processors respectively. Moreover, outer-Loop Vectorization provides new reuse opportunities that can be vital for such short SIMD architectures, including efficient handling of alignment. We present an optimization tapping such opportunities, capable of further boosting the performance obtained by outer-Loop Vectorization to achieve average speedup factors of 5.26 and 3.64.

Dorit Nuzman - One of the best experts on this subject based on the ideXlab platform.

  • polyhedral model guided Loop nest auto Vectorization
    International Conference on Parallel Architectures and Compilation Techniques, 2009
    Co-Authors: Konrad Trifunovic, Dorit Nuzman, Ayal Zaks, Albert Cohen, Ira Rosen
    Abstract:

    Optimizing compilers strive to construct efficient executables by applying sequences of transformations. Additional transformations are constantly being devised, with various mutual interactions among them, thereby exacerbating the notoriously difficult phase-ordering problem --- that of deciding which transformations to apply and in which order. Fortunately, new infrastructures such as the polyhedral compilation framework host a variety of transformations, facilitating the efficient exploration and configuration of multiple transformation sequences. Many powerful optimizations, however, remain external to the polyhedral framework, with potential mutual interactions that need to be considered. In this paper we examine the interactions between Loop transformations of the polyhedral compilation framework and subsequent Vectorization optimizations targeting fine-grain SIMD data-level parallelism. Automatic Vectorization involves many low-level, target-specific considerations and transformations, which currently exclude it from being part of the polyhedral framework. In order to consider potential interactions among polyhedral Loop transformations and Vectorization, we first model the performance impact of the different Loop transformations and Vectorization strategies, and then show how this cost model can be integrated seamlessly into the polyhedral representation. This predictive modelling then facilitates efficient exploration and educated decision making on how to best apply various polyhedral Loop transformations while considering the subsequent effects of different Vectorization schemes. Our work demonstrates the feasibility and benefit of tuning the polyhedral model in the context of Vectorization. Experimental results confirm that our model has accurate predictions, providing speedups of over 2 on average over traditional innermost-Loop Vectorization on PowerPC970 and Cell-SPU SIMD platforms.

  • Polyhedral-Model Guided Loop-Nest Auto-Vectorization
    2009 18th International Conference on Parallel Architectures and Compilation Techniques, 2009
    Co-Authors: Konrad Trifunovic, Dorit Nuzman, Ayal Zaks, Albert Cohen, Ira Rosen
    Abstract:

    Optimizing compilers apply numerous interdependent optimizations, leading to the notoriously difficult phase-ordering problem - that of deciding which transformations to apply and in which order. Fortunately, new infrastructures such as the polyhedral compilation framework host a variety of transformations, facilitating the efficient exploration and configuration of multiple transformation sequences. Many powerful optimizations, however, remain external to the polyhedral framework, including Vectorization. The low-level, target-specific aspects of Vectorization for fine-grain SIMD has so far excluded it from being part of the polyhedral framework. In this paper we examine the interactions between Loop transformations of the polyhedral framework and subsequent Vectorization. We model the performance impact of the different Loop transformations and Vectorization strategies, and then show how this cost model can be integrated seamlessly into the polyhedral representation. This predictive modelling facilitates efficient exploration and educated decision making to best apply various polyhedral Loop transformations while considering the subsequent effects of different Vectorization schemes. Our work demonstrates the feasibility and benefit of tuning the polyhedral model in the context of Vectorization. Experimental results confirm that our model has accurate predictions, providing speedups of over 2.0times on average over traditional innermost-Loop Vectorization on PowerPC970 and Cell-SPU SIMD platforms.

  • PACT - Outer-Loop Vectorization: revisited for short SIMD architectures
    Proceedings of the 17th international conference on Parallel architectures and compilation techniques - PACT '08, 2008
    Co-Authors: Dorit Nuzman, Ayal Zaks
    Abstract:

    Vectorization has been an important method of using data-level parallelism to accelerate scientific workloads on vector machines such as Cray for the past three decades. In the last decade it has also proven useful for accelerating multi-media and embedded applications on short SIMD architectures such as MMX, SSE and AltiVec. Most of the focus has been directed at innermost Loops, effectively executing their iterations concurrently as much as possible. Outer Loop Vectorization refers to vectorizing a level of a Loop nest other than the innermost, which can be beneficial if the outer Loop exhibits greater data-level parallelism and locality than the innermost Loop. Outer Loop Vectorization has traditionally been performed by interchanging an outer-Loop with the innermost Loop, followed by vectorizing it at the innermost position. A more direct unroll-and-jam approach can be used to vectorize an outer-Loop without involving Loop interchange, which can be especially suitable for short SIMD architectures. In this paper we revisit the method of outer Loop Vectorization, paying special attention to properties of modern short SIMD architectures. We show that even though current optimizing compilers for such targets do not apply outer-Loop Vectorization in general, it can provide significant performance improvements over innermost Loop Vectorization. Our implementation of direct outer-Loop Vectorization, available in GCC 4.3, achieves speedup factors of 3.13 and 2.77 on average across a set of benchmarks, compared to 1.53 and 1.39 achieved by innermost Loop Vectorization, when running on a Cell BE SPU and PowerPC970 processors respectively. Moreover, outer-Loop Vectorization provides new reuse opportunities that can be vital for such short SIMD architectures, including efficient handling of alignment. We present an optimization tapping such opportunities, capable of further boosting the performance obtained by outer-Loop Vectorization to achieve average speedup factors of 5.26 and 3.64.

  • Outer-Loop Vectorization - revisited for short SIMD architectures
    2008 International Conference on Parallel Architectures and Compilation Techniques (PACT), 2008
    Co-Authors: Dorit Nuzman, Ayal Zaks
    Abstract:

    Vectorization has been an important method of using data-level parallelism to accelerate scientific workloads on vector machines such as Cray for the past three decades. In the last decade it has also proven useful for accelerating multimedia and embedded applications on short SIMD architectures such as MMX, SSE and AltiVec. Most of the focus has been directed at innermost Loops, effectively executing their iterations concurrently as much as possible. Outer Loop Vectorization refers to vectorizing a level of a Loop nest other than the innermost, which can be beneficial if the outer Loop exhibits greater data-level parallelism and locality than the innermost Loop. Outer Loop Vectorization has traditionally been performed by interchanging an outer-Loop with the innermost Loop, followed by vectorizing it at the innermost position. A more direct unroll-and-jam approach can be used to vectorize an outer-Loop without involving Loop interchange, which can be especially suitable for short SIMD architectures. In this paper we revisit the method of outer Loop Vectorization, paying special attention to properties of modern short SIMD architectures. We show that even though current optimizing compilers for such targets do not apply outer-Loop Vectorization in general, it can provide significant performance improvements over innermost Loop Vectorization. Our implementation of direct outer-Loop Vectorization, available in GCC 4.3, achieves speedup factors of 3.13 and 2.77 on average across a set of benchmarks, compared to 1.53 and 1.39 achieved by innermost Loop Vectorization, when running on a Cell BE SPU and PowerPC970 processors respectively. Moreover, outer-Loop Vectorization provides new reuse opportunities that can be vital for such short SIMD architectures, including efficient handling of alignment. We present an optimization tapping such opportunities, capable of further boosting the performance obtained by outer-Loop Vectorization to achieve average speedup factors of 5.26 and 3.64.

Ira Rosen - One of the best experts on this subject based on the ideXlab platform.

  • polyhedral model guided Loop nest auto Vectorization
    International Conference on Parallel Architectures and Compilation Techniques, 2009
    Co-Authors: Konrad Trifunovic, Dorit Nuzman, Ayal Zaks, Albert Cohen, Ira Rosen
    Abstract:

    Optimizing compilers strive to construct efficient executables by applying sequences of transformations. Additional transformations are constantly being devised, with various mutual interactions among them, thereby exacerbating the notoriously difficult phase-ordering problem --- that of deciding which transformations to apply and in which order. Fortunately, new infrastructures such as the polyhedral compilation framework host a variety of transformations, facilitating the efficient exploration and configuration of multiple transformation sequences. Many powerful optimizations, however, remain external to the polyhedral framework, with potential mutual interactions that need to be considered. In this paper we examine the interactions between Loop transformations of the polyhedral compilation framework and subsequent Vectorization optimizations targeting fine-grain SIMD data-level parallelism. Automatic Vectorization involves many low-level, target-specific considerations and transformations, which currently exclude it from being part of the polyhedral framework. In order to consider potential interactions among polyhedral Loop transformations and Vectorization, we first model the performance impact of the different Loop transformations and Vectorization strategies, and then show how this cost model can be integrated seamlessly into the polyhedral representation. This predictive modelling then facilitates efficient exploration and educated decision making on how to best apply various polyhedral Loop transformations while considering the subsequent effects of different Vectorization schemes. Our work demonstrates the feasibility and benefit of tuning the polyhedral model in the context of Vectorization. Experimental results confirm that our model has accurate predictions, providing speedups of over 2 on average over traditional innermost-Loop Vectorization on PowerPC970 and Cell-SPU SIMD platforms.

  • Polyhedral-Model Guided Loop-Nest Auto-Vectorization
    2009 18th International Conference on Parallel Architectures and Compilation Techniques, 2009
    Co-Authors: Konrad Trifunovic, Dorit Nuzman, Ayal Zaks, Albert Cohen, Ira Rosen
    Abstract:

    Optimizing compilers apply numerous interdependent optimizations, leading to the notoriously difficult phase-ordering problem - that of deciding which transformations to apply and in which order. Fortunately, new infrastructures such as the polyhedral compilation framework host a variety of transformations, facilitating the efficient exploration and configuration of multiple transformation sequences. Many powerful optimizations, however, remain external to the polyhedral framework, including Vectorization. The low-level, target-specific aspects of Vectorization for fine-grain SIMD has so far excluded it from being part of the polyhedral framework. In this paper we examine the interactions between Loop transformations of the polyhedral framework and subsequent Vectorization. We model the performance impact of the different Loop transformations and Vectorization strategies, and then show how this cost model can be integrated seamlessly into the polyhedral representation. This predictive modelling facilitates efficient exploration and educated decision making to best apply various polyhedral Loop transformations while considering the subsequent effects of different Vectorization schemes. Our work demonstrates the feasibility and benefit of tuning the polyhedral model in the context of Vectorization. Experimental results confirm that our model has accurate predictions, providing speedups of over 2.0times on average over traditional innermost-Loop Vectorization on PowerPC970 and Cell-SPU SIMD platforms.

Konrad Trifunovic - One of the best experts on this subject based on the ideXlab platform.

  • polyhedral model guided Loop nest auto Vectorization
    International Conference on Parallel Architectures and Compilation Techniques, 2009
    Co-Authors: Konrad Trifunovic, Dorit Nuzman, Ayal Zaks, Albert Cohen, Ira Rosen
    Abstract:

    Optimizing compilers strive to construct efficient executables by applying sequences of transformations. Additional transformations are constantly being devised, with various mutual interactions among them, thereby exacerbating the notoriously difficult phase-ordering problem --- that of deciding which transformations to apply and in which order. Fortunately, new infrastructures such as the polyhedral compilation framework host a variety of transformations, facilitating the efficient exploration and configuration of multiple transformation sequences. Many powerful optimizations, however, remain external to the polyhedral framework, with potential mutual interactions that need to be considered. In this paper we examine the interactions between Loop transformations of the polyhedral compilation framework and subsequent Vectorization optimizations targeting fine-grain SIMD data-level parallelism. Automatic Vectorization involves many low-level, target-specific considerations and transformations, which currently exclude it from being part of the polyhedral framework. In order to consider potential interactions among polyhedral Loop transformations and Vectorization, we first model the performance impact of the different Loop transformations and Vectorization strategies, and then show how this cost model can be integrated seamlessly into the polyhedral representation. This predictive modelling then facilitates efficient exploration and educated decision making on how to best apply various polyhedral Loop transformations while considering the subsequent effects of different Vectorization schemes. Our work demonstrates the feasibility and benefit of tuning the polyhedral model in the context of Vectorization. Experimental results confirm that our model has accurate predictions, providing speedups of over 2 on average over traditional innermost-Loop Vectorization on PowerPC970 and Cell-SPU SIMD platforms.

  • Polyhedral-Model Guided Loop-Nest Auto-Vectorization
    2009 18th International Conference on Parallel Architectures and Compilation Techniques, 2009
    Co-Authors: Konrad Trifunovic, Dorit Nuzman, Ayal Zaks, Albert Cohen, Ira Rosen
    Abstract:

    Optimizing compilers apply numerous interdependent optimizations, leading to the notoriously difficult phase-ordering problem - that of deciding which transformations to apply and in which order. Fortunately, new infrastructures such as the polyhedral compilation framework host a variety of transformations, facilitating the efficient exploration and configuration of multiple transformation sequences. Many powerful optimizations, however, remain external to the polyhedral framework, including Vectorization. The low-level, target-specific aspects of Vectorization for fine-grain SIMD has so far excluded it from being part of the polyhedral framework. In this paper we examine the interactions between Loop transformations of the polyhedral framework and subsequent Vectorization. We model the performance impact of the different Loop transformations and Vectorization strategies, and then show how this cost model can be integrated seamlessly into the polyhedral representation. This predictive modelling facilitates efficient exploration and educated decision making to best apply various polyhedral Loop transformations while considering the subsequent effects of different Vectorization schemes. Our work demonstrates the feasibility and benefit of tuning the polyhedral model in the context of Vectorization. Experimental results confirm that our model has accurate predictions, providing speedups of over 2.0times on average over traditional innermost-Loop Vectorization on PowerPC970 and Cell-SPU SIMD platforms.

Albert Cohen - One of the best experts on this subject based on the ideXlab platform.

  • polyhedral model guided Loop nest auto Vectorization
    International Conference on Parallel Architectures and Compilation Techniques, 2009
    Co-Authors: Konrad Trifunovic, Dorit Nuzman, Ayal Zaks, Albert Cohen, Ira Rosen
    Abstract:

    Optimizing compilers strive to construct efficient executables by applying sequences of transformations. Additional transformations are constantly being devised, with various mutual interactions among them, thereby exacerbating the notoriously difficult phase-ordering problem --- that of deciding which transformations to apply and in which order. Fortunately, new infrastructures such as the polyhedral compilation framework host a variety of transformations, facilitating the efficient exploration and configuration of multiple transformation sequences. Many powerful optimizations, however, remain external to the polyhedral framework, with potential mutual interactions that need to be considered. In this paper we examine the interactions between Loop transformations of the polyhedral compilation framework and subsequent Vectorization optimizations targeting fine-grain SIMD data-level parallelism. Automatic Vectorization involves many low-level, target-specific considerations and transformations, which currently exclude it from being part of the polyhedral framework. In order to consider potential interactions among polyhedral Loop transformations and Vectorization, we first model the performance impact of the different Loop transformations and Vectorization strategies, and then show how this cost model can be integrated seamlessly into the polyhedral representation. This predictive modelling then facilitates efficient exploration and educated decision making on how to best apply various polyhedral Loop transformations while considering the subsequent effects of different Vectorization schemes. Our work demonstrates the feasibility and benefit of tuning the polyhedral model in the context of Vectorization. Experimental results confirm that our model has accurate predictions, providing speedups of over 2 on average over traditional innermost-Loop Vectorization on PowerPC970 and Cell-SPU SIMD platforms.

  • Polyhedral-Model Guided Loop-Nest Auto-Vectorization
    2009 18th International Conference on Parallel Architectures and Compilation Techniques, 2009
    Co-Authors: Konrad Trifunovic, Dorit Nuzman, Ayal Zaks, Albert Cohen, Ira Rosen
    Abstract:

    Optimizing compilers apply numerous interdependent optimizations, leading to the notoriously difficult phase-ordering problem - that of deciding which transformations to apply and in which order. Fortunately, new infrastructures such as the polyhedral compilation framework host a variety of transformations, facilitating the efficient exploration and configuration of multiple transformation sequences. Many powerful optimizations, however, remain external to the polyhedral framework, including Vectorization. The low-level, target-specific aspects of Vectorization for fine-grain SIMD has so far excluded it from being part of the polyhedral framework. In this paper we examine the interactions between Loop transformations of the polyhedral framework and subsequent Vectorization. We model the performance impact of the different Loop transformations and Vectorization strategies, and then show how this cost model can be integrated seamlessly into the polyhedral representation. This predictive modelling facilitates efficient exploration and educated decision making to best apply various polyhedral Loop transformations while considering the subsequent effects of different Vectorization schemes. Our work demonstrates the feasibility and benefit of tuning the polyhedral model in the context of Vectorization. Experimental results confirm that our model has accurate predictions, providing speedups of over 2.0times on average over traditional innermost-Loop Vectorization on PowerPC970 and Cell-SPU SIMD platforms.