The Experts below are selected from a list of 171465 Experts worldwide ranked by ideXlab platform
David Gregg - One of the best experts on this subject based on the ideXlab platform.
-
PACT - An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs
2015 International Conference on Parallel Architecture and Compilation (PACT), 2015Co-Authors: Shixiong Xu, David GreggAbstract:Nested thread-level Parallelism (TLP) is pervasive in real applications. For example, 75% (14 out of 19) of the applications in the Rodinia benchmark for heterogeneous accelerators contain kernels with nested thread-level Parallelism. Efficiently mapping the enclosed nested Parallelism to the GPU threads in the C-to-CUDA compilation (OpenACC in this paper) is becoming more and more important. This mapping problem is two folds: suitable execution models and efficient mapping strategies of the nested Parallelism.
-
An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs
2015 International Conference on Parallel Architecture and Compilation (PACT), 2015Co-Authors: Shixiong Xu, David GreggAbstract:Nested thread-level Parallelism (TLP) is pervasive in real applications. For example, 75% (14 out of 19) of the applications in the Rodinia benchmark for heterogeneous accelerators contain kernels with nested thread-level Parallelism. Efficiently mapping the enclosed nested Parallelism to the GPU threads in the C-to-CUDA compilation (OpenACC in this paper) is becoming more and more important. This mapping problem is two folds: suitable execution models and efficient mapping strategies of the nested Parallelism.
Andreas Wolf - One of the best experts on this subject based on the ideXlab platform.
-
a class of openmp applications involving nested Parallelism
ACM Symposium on Applied Computing, 2004Co-Authors: Martin H Bucker, Arno Rasch, Andreas WolfAbstract:Today, OpenMP is the de facto standard for portable shared-memory programming supporting multiple levels of Parallelism. Unfortunately, most of the current OpenMP implementations are not capable of fully exploiting more than one level of Parallelism. With the increasing number of processors available in high-performance computing resources, the number of applications that would benefit from multilevel Parallelism is also increasing. Applying automatic differentiation to OpenMP programs is introduced as a new class of OpenMP applications with nested Parallelism.
-
SAC - A class of OpenMP applications involving nested Parallelism
Proceedings of the 2004 ACM symposium on Applied computing - SAC '04, 2004Co-Authors: H. Martin Bücker, Arno Rasch, Andreas WolfAbstract:Today, OpenMP is the de facto standard for portable shared-memory programming supporting multiple levels of Parallelism. Unfortunately, most of the current OpenMP implementations are not capable of fully exploiting more than one level of Parallelism. With the increasing number of processors available in high-performance computing resources, the number of applications that would benefit from multilevel Parallelism is also increasing. Applying automatic differentiation to OpenMP programs is introduced as a new class of OpenMP applications with nested Parallelism.
Shixiong Xu - One of the best experts on this subject based on the ideXlab platform.
-
PACT - An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs
2015 International Conference on Parallel Architecture and Compilation (PACT), 2015Co-Authors: Shixiong Xu, David GreggAbstract:Nested thread-level Parallelism (TLP) is pervasive in real applications. For example, 75% (14 out of 19) of the applications in the Rodinia benchmark for heterogeneous accelerators contain kernels with nested thread-level Parallelism. Efficiently mapping the enclosed nested Parallelism to the GPU threads in the C-to-CUDA compilation (OpenACC in this paper) is becoming more and more important. This mapping problem is two folds: suitable execution models and efficient mapping strategies of the nested Parallelism.
-
An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs
2015 International Conference on Parallel Architecture and Compilation (PACT), 2015Co-Authors: Shixiong Xu, David GreggAbstract:Nested thread-level Parallelism (TLP) is pervasive in real applications. For example, 75% (14 out of 19) of the applications in the Rodinia benchmark for heterogeneous accelerators contain kernels with nested thread-level Parallelism. Efficiently mapping the enclosed nested Parallelism to the GPU threads in the C-to-CUDA compilation (OpenACC in this paper) is becoming more and more important. This mapping problem is two folds: suitable execution models and efficient mapping strategies of the nested Parallelism.
Martin H Bucker - One of the best experts on this subject based on the ideXlab platform.
-
a class of openmp applications involving nested Parallelism
ACM Symposium on Applied Computing, 2004Co-Authors: Martin H Bucker, Arno Rasch, Andreas WolfAbstract:Today, OpenMP is the de facto standard for portable shared-memory programming supporting multiple levels of Parallelism. Unfortunately, most of the current OpenMP implementations are not capable of fully exploiting more than one level of Parallelism. With the increasing number of processors available in high-performance computing resources, the number of applications that would benefit from multilevel Parallelism is also increasing. Applying automatic differentiation to OpenMP programs is introduced as a new class of OpenMP applications with nested Parallelism.
Arno Rasch - One of the best experts on this subject based on the ideXlab platform.
-
a class of openmp applications involving nested Parallelism
ACM Symposium on Applied Computing, 2004Co-Authors: Martin H Bucker, Arno Rasch, Andreas WolfAbstract:Today, OpenMP is the de facto standard for portable shared-memory programming supporting multiple levels of Parallelism. Unfortunately, most of the current OpenMP implementations are not capable of fully exploiting more than one level of Parallelism. With the increasing number of processors available in high-performance computing resources, the number of applications that would benefit from multilevel Parallelism is also increasing. Applying automatic differentiation to OpenMP programs is introduced as a new class of OpenMP applications with nested Parallelism.
-
SAC - A class of OpenMP applications involving nested Parallelism
Proceedings of the 2004 ACM symposium on Applied computing - SAC '04, 2004Co-Authors: H. Martin Bücker, Arno Rasch, Andreas WolfAbstract:Today, OpenMP is the de facto standard for portable shared-memory programming supporting multiple levels of Parallelism. Unfortunately, most of the current OpenMP implementations are not capable of fully exploiting more than one level of Parallelism. With the increasing number of processors available in high-performance computing resources, the number of applications that would benefit from multilevel Parallelism is also increasing. Applying automatic differentiation to OpenMP programs is introduced as a new class of OpenMP applications with nested Parallelism.