Parallelism

The Experts below are selected from a list of 171465 Experts worldwide ranked by ideXlab platform

David Gregg - One of the best experts on this subject based on the ideXlab platform.

PACT - An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs

2015 International Conference on Parallel Architecture and Compilation (PACT), 2015

Co-Authors: Shixiong Xu, David Gregg

Abstract:

Nested thread-level Parallelism (TLP) is pervasive in real applications. For example, 75% (14 out of 19) of the applications in the Rodinia benchmark for heterogeneous accelerators contain kernels with nested thread-level Parallelism. Efficiently mapping the enclosed nested Parallelism to the GPU threads in the C-to-CUDA compilation (OpenACC in this paper) is becoming more and more important. This mapping problem is two folds: suitable execution models and efficient mapping strategies of the nested Parallelism.

15 days free trial to Access Article
An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs

2015 International Conference on Parallel Architecture and Compilation (PACT), 2015

Co-Authors: Shixiong Xu, David Gregg

Abstract:

Nested thread-level Parallelism (TLP) is pervasive in real applications. For example, 75% (14 out of 19) of the applications in the Rodinia benchmark for heterogeneous accelerators contain kernels with nested thread-level Parallelism. Efficiently mapping the enclosed nested Parallelism to the GPU threads in the C-to-CUDA compilation (OpenACC in this paper) is becoming more and more important. This mapping problem is two folds: suitable execution models and efficient mapping strategies of the nested Parallelism.

15 days free trial to Access Article

Andreas Wolf - One of the best experts on this subject based on the ideXlab platform.

a class of openmp applications involving nested Parallelism

ACM Symposium on Applied Computing, 2004

Co-Authors: Martin H Bucker, Arno Rasch, Andreas Wolf

Abstract:

Today, OpenMP is the de facto standard for portable shared-memory programming supporting multiple levels of Parallelism. Unfortunately, most of the current OpenMP implementations are not capable of fully exploiting more than one level of Parallelism. With the increasing number of processors available in high-performance computing resources, the number of applications that would benefit from multilevel Parallelism is also increasing. Applying automatic differentiation to OpenMP programs is introduced as a new class of OpenMP applications with nested Parallelism.

15 days free trial to Access Article
SAC - A class of OpenMP applications involving nested Parallelism

Proceedings of the 2004 ACM symposium on Applied computing - SAC '04, 2004

Co-Authors: H. Martin Bücker, Arno Rasch, Andreas Wolf

Abstract:

Today, OpenMP is the de facto standard for portable shared-memory programming supporting multiple levels of Parallelism. Unfortunately, most of the current OpenMP implementations are not capable of fully exploiting more than one level of Parallelism. With the increasing number of processors available in high-performance computing resources, the number of applications that would benefit from multilevel Parallelism is also increasing. Applying automatic differentiation to OpenMP programs is introduced as a new class of OpenMP applications with nested Parallelism.

15 days free trial to Access Article

Shixiong Xu - One of the best experts on this subject based on the ideXlab platform.

PACT - An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs

2015 International Conference on Parallel Architecture and Compilation (PACT), 2015

Co-Authors: Shixiong Xu, David Gregg

Abstract:

Nested thread-level Parallelism (TLP) is pervasive in real applications. For example, 75% (14 out of 19) of the applications in the Rodinia benchmark for heterogeneous accelerators contain kernels with nested thread-level Parallelism. Efficiently mapping the enclosed nested Parallelism to the GPU threads in the C-to-CUDA compilation (OpenACC in this paper) is becoming more and more important. This mapping problem is two folds: suitable execution models and efficient mapping strategies of the nested Parallelism.

15 days free trial to Access Article
An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs

2015 International Conference on Parallel Architecture and Compilation (PACT), 2015

Co-Authors: Shixiong Xu, David Gregg

Abstract:

Nested thread-level Parallelism (TLP) is pervasive in real applications. For example, 75% (14 out of 19) of the applications in the Rodinia benchmark for heterogeneous accelerators contain kernels with nested thread-level Parallelism. Efficiently mapping the enclosed nested Parallelism to the GPU threads in the C-to-CUDA compilation (OpenACC in this paper) is becoming more and more important. This mapping problem is two folds: suitable execution models and efficient mapping strategies of the nested Parallelism.

15 days free trial to Access Article

Martin H Bucker - One of the best experts on this subject based on the ideXlab platform.

a class of openmp applications involving nested Parallelism

ACM Symposium on Applied Computing, 2004

Co-Authors: Martin H Bucker, Arno Rasch, Andreas Wolf

Abstract:

Today, OpenMP is the de facto standard for portable shared-memory programming supporting multiple levels of Parallelism. Unfortunately, most of the current OpenMP implementations are not capable of fully exploiting more than one level of Parallelism. With the increasing number of processors available in high-performance computing resources, the number of applications that would benefit from multilevel Parallelism is also increasing. Applying automatic differentiation to OpenMP programs is introduced as a new class of OpenMP applications with nested Parallelism.

15 days free trial to Access Article

Arno Rasch - One of the best experts on this subject based on the ideXlab platform.

a class of openmp applications involving nested Parallelism

ACM Symposium on Applied Computing, 2004

Co-Authors: Martin H Bucker, Arno Rasch, Andreas Wolf

Abstract:

Today, OpenMP is the de facto standard for portable shared-memory programming supporting multiple levels of Parallelism. Unfortunately, most of the current OpenMP implementations are not capable of fully exploiting more than one level of Parallelism. With the increasing number of processors available in high-performance computing resources, the number of applications that would benefit from multilevel Parallelism is also increasing. Applying automatic differentiation to OpenMP programs is introduced as a new class of OpenMP applications with nested Parallelism.

15 days free trial to Access Article
SAC - A class of OpenMP applications involving nested Parallelism

Proceedings of the 2004 ACM symposium on Applied computing - SAC '04, 2004

Co-Authors: H. Martin Bücker, Arno Rasch, Andreas Wolf

Abstract:

Today, OpenMP is the de facto standard for portable shared-memory programming supporting multiple levels of Parallelism. Unfortunately, most of the current OpenMP implementations are not capable of fully exploiting more than one level of Parallelism. With the increasing number of processors available in high-performance computing resources, the number of applications that would benefit from multilevel Parallelism is also increasing. Applying automatic differentiation to OpenMP programs is introduced as a new class of OpenMP applications with nested Parallelism.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

David Gregg - One of the best experts on this subject based on the ideXlab platform.

PACT - An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs

An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs

Andreas Wolf - One of the best experts on this subject based on the ideXlab platform.

a class of openmp applications involving nested Parallelism

SAC - A class of OpenMP applications involving nested Parallelism

Shixiong Xu - One of the best experts on this subject based on the ideXlab platform.

PACT - An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs

An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs

Martin H Bucker - One of the best experts on this subject based on the ideXlab platform.

a class of openmp applications involving nested Parallelism

Arno Rasch - One of the best experts on this subject based on the ideXlab platform.

a class of openmp applications involving nested Parallelism

SAC - A class of OpenMP applications involving nested Parallelism

Parallelism

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Related terms

David Gregg - One of the best experts on this subject based on the ideXlab platform.

Andreas Wolf - One of the best experts on this subject based on the ideXlab platform.

Shixiong Xu - One of the best experts on this subject based on the ideXlab platform.

Martin H Bucker - One of the best experts on this subject based on the ideXlab platform.

Arno Rasch - One of the best experts on this subject based on the ideXlab platform.

Related terms