Parallelism

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 171465 Experts worldwide ranked by ideXlab platform

David Gregg - One of the best experts on this subject based on the ideXlab platform.

  • PACT - An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs
    2015 International Conference on Parallel Architecture and Compilation (PACT), 2015
    Co-Authors: Shixiong Xu, David Gregg
    Abstract:

    Nested thread-level Parallelism (TLP) is pervasive in real applications. For example, 75% (14 out of 19) of the applications in the Rodinia benchmark for heterogeneous accelerators contain kernels with nested thread-level Parallelism. Efficiently mapping the enclosed nested Parallelism to the GPU threads in the C-to-CUDA compilation (OpenACC in this paper) is becoming more and more important. This mapping problem is two folds: suitable execution models and efficient mapping strategies of the nested Parallelism.

  • An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs
    2015 International Conference on Parallel Architecture and Compilation (PACT), 2015
    Co-Authors: Shixiong Xu, David Gregg
    Abstract:

    Nested thread-level Parallelism (TLP) is pervasive in real applications. For example, 75% (14 out of 19) of the applications in the Rodinia benchmark for heterogeneous accelerators contain kernels with nested thread-level Parallelism. Efficiently mapping the enclosed nested Parallelism to the GPU threads in the C-to-CUDA compilation (OpenACC in this paper) is becoming more and more important. This mapping problem is two folds: suitable execution models and efficient mapping strategies of the nested Parallelism.

Andreas Wolf - One of the best experts on this subject based on the ideXlab platform.

  • a class of openmp applications involving nested Parallelism
    ACM Symposium on Applied Computing, 2004
    Co-Authors: Martin H Bucker, Arno Rasch, Andreas Wolf
    Abstract:

    Today, OpenMP is the de facto standard for portable shared-memory programming supporting multiple levels of Parallelism. Unfortunately, most of the current OpenMP implementations are not capable of fully exploiting more than one level of Parallelism. With the increasing number of processors available in high-performance computing resources, the number of applications that would benefit from multilevel Parallelism is also increasing. Applying automatic differentiation to OpenMP programs is introduced as a new class of OpenMP applications with nested Parallelism.

  • SAC - A class of OpenMP applications involving nested Parallelism
    Proceedings of the 2004 ACM symposium on Applied computing - SAC '04, 2004
    Co-Authors: H. Martin Bücker, Arno Rasch, Andreas Wolf
    Abstract:

    Today, OpenMP is the de facto standard for portable shared-memory programming supporting multiple levels of Parallelism. Unfortunately, most of the current OpenMP implementations are not capable of fully exploiting more than one level of Parallelism. With the increasing number of processors available in high-performance computing resources, the number of applications that would benefit from multilevel Parallelism is also increasing. Applying automatic differentiation to OpenMP programs is introduced as a new class of OpenMP applications with nested Parallelism.

Shixiong Xu - One of the best experts on this subject based on the ideXlab platform.

  • PACT - An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs
    2015 International Conference on Parallel Architecture and Compilation (PACT), 2015
    Co-Authors: Shixiong Xu, David Gregg
    Abstract:

    Nested thread-level Parallelism (TLP) is pervasive in real applications. For example, 75% (14 out of 19) of the applications in the Rodinia benchmark for heterogeneous accelerators contain kernels with nested thread-level Parallelism. Efficiently mapping the enclosed nested Parallelism to the GPU threads in the C-to-CUDA compilation (OpenACC in this paper) is becoming more and more important. This mapping problem is two folds: suitable execution models and efficient mapping strategies of the nested Parallelism.

  • An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs
    2015 International Conference on Parallel Architecture and Compilation (PACT), 2015
    Co-Authors: Shixiong Xu, David Gregg
    Abstract:

    Nested thread-level Parallelism (TLP) is pervasive in real applications. For example, 75% (14 out of 19) of the applications in the Rodinia benchmark for heterogeneous accelerators contain kernels with nested thread-level Parallelism. Efficiently mapping the enclosed nested Parallelism to the GPU threads in the C-to-CUDA compilation (OpenACC in this paper) is becoming more and more important. This mapping problem is two folds: suitable execution models and efficient mapping strategies of the nested Parallelism.

Martin H Bucker - One of the best experts on this subject based on the ideXlab platform.

  • a class of openmp applications involving nested Parallelism
    ACM Symposium on Applied Computing, 2004
    Co-Authors: Martin H Bucker, Arno Rasch, Andreas Wolf
    Abstract:

    Today, OpenMP is the de facto standard for portable shared-memory programming supporting multiple levels of Parallelism. Unfortunately, most of the current OpenMP implementations are not capable of fully exploiting more than one level of Parallelism. With the increasing number of processors available in high-performance computing resources, the number of applications that would benefit from multilevel Parallelism is also increasing. Applying automatic differentiation to OpenMP programs is introduced as a new class of OpenMP applications with nested Parallelism.

Arno Rasch - One of the best experts on this subject based on the ideXlab platform.

  • a class of openmp applications involving nested Parallelism
    ACM Symposium on Applied Computing, 2004
    Co-Authors: Martin H Bucker, Arno Rasch, Andreas Wolf
    Abstract:

    Today, OpenMP is the de facto standard for portable shared-memory programming supporting multiple levels of Parallelism. Unfortunately, most of the current OpenMP implementations are not capable of fully exploiting more than one level of Parallelism. With the increasing number of processors available in high-performance computing resources, the number of applications that would benefit from multilevel Parallelism is also increasing. Applying automatic differentiation to OpenMP programs is introduced as a new class of OpenMP applications with nested Parallelism.

  • SAC - A class of OpenMP applications involving nested Parallelism
    Proceedings of the 2004 ACM symposium on Applied computing - SAC '04, 2004
    Co-Authors: H. Martin Bücker, Arno Rasch, Andreas Wolf
    Abstract:

    Today, OpenMP is the de facto standard for portable shared-memory programming supporting multiple levels of Parallelism. Unfortunately, most of the current OpenMP implementations are not capable of fully exploiting more than one level of Parallelism. With the increasing number of processors available in high-performance computing resources, the number of applications that would benefit from multilevel Parallelism is also increasing. Applying automatic differentiation to OpenMP programs is introduced as a new class of OpenMP applications with nested Parallelism.