Task Parallelism

The Experts below are selected from a list of 15516 Experts worldwide ranked by ideXlab platform

Vivek Sarkar - One of the best experts on this subject based on the ideXlab platform.

OpenSHMEM - Integrating Asynchronous Task Parallelism with OpenSHMEM

OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

Co-Authors: Max Grossman, Vivek Kumar, Zoran Budimlic, Vivek Sarkar

Abstract:

Partitioned Global Address Space (PGAS) programming models combine shared and distributed memory features, and provide a foundation for high-productivity parallel programming using lightweight one-sided communications. The OpenSHMEM programming interface has recently begun gaining popularity as a lightweight library-based approach for developing PGAS applications, in part through its use of a symmetric heap to realize more efficient implementations of global pointers than in other PGAS systems. However, current approaches to hybrid inter-node and intra-node parallel programming in OpenSHMEM rely on the use of multithreaded programming models (e.g., pthreads, OpenMP) that harness intra-node Parallelism but are opaque to the OpenSHMEM runtime. This OpenSHMEM+X approach can encounter performance challenges such as bottlenecks on shared resources, long pause times due to load imbalances, and poor data locality. Furthermore, OpenSHMEM+X requires the expertise of hero-level programmers, compared to the use of just OpenSHMEM. All of these are hard challenges to mitigate with incremental changes. This situation will worsen as computing nodes increase their use of accelerators and heterogeneous memories.

15 days free trial to Access Article
integrating asynchronous Task Parallelism with openshmem

Workshop on OpenSHMEM and Related Technologies, 2016

Co-Authors: Max Grossman, Vivek Kumar, Zoran Budimlic, Vivek Sarkar

Abstract:

Partitioned Global Address Space (PGAS) programming models combine shared and distributed memory features, and provide a foundation for high-productivity parallel programming using lightweight one-sided communications. The OpenSHMEM programming interface has recently begun gaining popularity as a lightweight library-based approach for developing PGAS applications, in part through its use of a symmetric heap to realize more efficient implementations of global pointers than in other PGAS systems. However, current approaches to hybrid inter-node and intra-node parallel programming in OpenSHMEM rely on the use of multithreaded programming models (e.g., pthreads, OpenMP) that harness intra-node Parallelism but are opaque to the OpenSHMEM runtime. This OpenSHMEM+X approach can encounter performance challenges such as bottlenecks on shared resources, long pause times due to load imbalances, and poor data locality. Furthermore, OpenSHMEM+X requires the expertise of hero-level programmers, compared to the use of just OpenSHMEM. All of these are hard challenges to mitigate with incremental changes. This situation will worsen as computing nodes increase their use of accelerators and heterogeneous memories.

15 days free trial to Access Article
SCnC: Efficient Unification of Streaming with Dynamic Task Parallelism

International Journal of Parallel Programming, 2016

Co-Authors: Dragoş Sbîrlea, Jun Shirako, Ryan Newton, Vivek Sarkar

Abstract:

Stream processing is a special form of the dataflow execution model that offers extensive opportunities for optimization and automatic parallelization. To take full advantage of the paradigm programmers are typically required to learn a new language and re-implement their applications. This work shows that it is possible to exploit streaming as a safe and automatic optimization of a more general dataflow-based model—one in which computation kernels are written in standard, general-purpose languages and organized as a coordination graph. We propose streaming concurrent collections (SCnC), a streaming system that can efficiently run a subset of programs supported by concurrent collections (CnC). CnC is a general purpose parallel programming paradigm that integrates Task Parallelism and dataflow computing. The proposed streaming support allows application developers to reason about their program as a general dataflow graph, while benefiting from the performance and tight memory footprint of stream Parallelism when their program satisfies streaming constraints. In this paper, we formally define the application requirements for using SCnC, and outline a static decision procedure for identifying and processing eligible SCnC subgraphs. We present initial results showing that transitioning from general CnC to SCnC leads to a throughput increase of up to 40 $$\times $$ × for certain benchmarks, and also enables programs with large data sizes to execute in available memory for cases where CnC execution may run out of memory.

15 days free trial to Access Article
elastic Tasks unifying Task Parallelism and spmd Parallelism with an adaptive runtime

European Conference on Parallel Processing, 2015

Co-Authors: Alina Sbirlea, Kunal Agrawal, Vivek Sarkar

Abstract:

In this paper, we introduce elastic Tasks, a new high-level parallel programming primitive that can be used to unify Task Parallelism and SPMD Parallelism in a common adaptive scheduling framework. Elastic Tasks are internally parallel Tasks and can run on a single worker or expand to take over multiple workers. An elastic Task can be an ordinary Task or an SPMD region that must be executed by one or more workers simultaneously, in a tightly coupled manner.

15 days free trial to Access Article
Euro-Par - Elastic Tasks: Unifying Task Parallelism and SPMD Parallelism with an Adaptive Runtime

Lecture Notes in Computer Science, 2015

Co-Authors: Alina Sbirlea, Kunal Agrawal, Vivek Sarkar

Abstract:

In this paper, we introduce elastic Tasks, a new high-level parallel programming primitive that can be used to unify Task Parallelism and SPMD Parallelism in a common adaptive scheduling framework. Elastic Tasks are internally parallel Tasks and can run on a single worker or expand to take over multiple workers. An elastic Task can be an ordinary Task or an SPMD region that must be executed by one or more workers simultaneously, in a tightly coupled manner.

15 days free trial to Access Article

Eduard Ayguade - One of the best experts on this subject based on the ideXlab platform.

PARSECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite

ACM Transactions on Architecture and Code Optimization, 2016

Co-Authors: Dimitrios Chasapis, Eduard Ayguade, Marc Casas, Miquel Moreto, Raul Vidal, Jesús Labarta, Mateo Valero

Abstract:

In this work, we show how parallel applications can be implemented efficiently using Task Parallelism. We also evaluate the benefits of such parallel paradigm with respect to other approaches. We use the PARSEC benchmark suite as our test bed, which includes applications representative of a wide range of domains from HPC to desktop and server applications. We adopt different parallelization techniques, tailored to the needs of each application, to fully exploit the Task-based model. Our evaluation shows that Task Parallelism achieves better performance than thread-based parallelization models, such as Pthreads. Our experimental results show that we can obtain scalability improvements up to 42p on a 16-core system and code size reductions up to 81p. Such reductions are achieved by removing from the source code application specific schedulers or thread pooling systems and transferring these responsibilities to the runtime system software.

15 days free trial to Access Article
LCPC - Unrolling loops containing Task Parallelism

Languages and Compilers for Parallel Computing, 2010

Co-Authors: Roger Ferrer, Alejandro Duran, Xavier Martorell, Eduard Ayguade

Abstract:

Classic loop unrolling allows to increase the performance of sequential loops by reducing the overheads of the non-computational parts of the loop. Unfortunately, when the loop contains Parallelism inside most compilers will ignore it or perform a naive transformation. We propose to extend the semantics of the loop unrolling transformation to cover loops that contain Task Parallelism. In these cases, the transformation will try to aggregate the multiple Tasks that appear after a classic unrolling phase to reduce the overheads per iteration. We present an implementation of such extended loop unrolling for OpenMP Tasks with two phases: a classical unroll followed by a Task aggregation phase. Our aggregation technique covers the special cases where Task Parallelism appears inside branches or where the loop is uncountable. Our experimental results show that using this extended unroll allows loops with fine-grained Tasks to reduce the overheads associated with Task creation and obtain a much better scaling.

15 days free trial to Access Article
unrolling loops containing Task Parallelism

Languages and Compilers for Parallel Computing, 2009

Co-Authors: Roger Ferrer, Alejandro Duran, Xavier Martorell, Eduard Ayguade

Abstract:

Classic loop unrolling allows to increase the performance of sequential loops by reducing the overheads of the non-computational parts of the loop. Unfortunately, when the loop contains Parallelism inside most compilers will ignore it or perform a naive transformation. We propose to extend the semantics of the loop unrolling transformation to cover loops that contain Task Parallelism. In these cases, the transformation will try to aggregate the multiple Tasks that appear after a classic unrolling phase to reduce the overheads per iteration. We present an implementation of such extended loop unrolling for OpenMP Tasks with two phases: a classical unroll followed by a Task aggregation phase. Our aggregation technique covers the special cases where Task Parallelism appears inside branches or where the loop is uncountable. Our experimental results show that using this extended unroll allows loops with fine-grained Tasks to reduce the overheads associated with Task creation and obtain a much better scaling.

15 days free trial to Access Article
barcelona openmp Tasks suite a set of benchmarks targeting the exploitation of Task Parallelism in openmp

International Conference on Parallel Processing, 2009

Co-Authors: Alejandro Duran, Roger Ferrer, Xavier Martorell, Xavier Teruel, Eduard Ayguade

Abstract:

Traditional parallel applications have exploited regular Parallelism, based on parallel loops. Only a few applications exploit sections Parallelism. With the release of the new OpenMP specification (3.0), this programming model supports Tasking. Parallel Tasks allow the exploitation of irregular Parallelism, but there is a lack of benchmarks exploiting Tasks in OpenMP. With the current (and projected) multicore architectures that offer many more alternatives to execute parallel applications than traditional SMP machines, this kind of Parallelism is increasingly important. And so, the need to have some set of benchmarks to evaluate it. In this paper, we motivate the need of having such a benchmarks suite, for irregular and/or recursive Task Parallelism. We present our proposal, the Barcelona OpenMP Tasks Suite (BOTS), with a set of applications exploiting regular and irregular Parallelism, based on Tasks. We present an overall evaluation of the BOTS benchmarks in an Altix system and we discuss some of the different experiments that can be done with the different compilation and runtime alternatives of the benchmarks.

15 days free trial to Access Article
ICPP - Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP

2009 International Conference on Parallel Processing, 2009

Co-Authors: Alejandro Duran, Roger Ferrer, Xavier Martorell, Xavier Teruel, Eduard Ayguade

Abstract:

Traditional parallel applications have exploited regular Parallelism, based on parallel loops. Only a few applications exploit sections Parallelism. With the release of the new OpenMP specification (3.0), this programming model supports Tasking. Parallel Tasks allow the exploitation of irregular Parallelism, but there is a lack of benchmarks exploiting Tasks in OpenMP. With the current (and projected) multicore architectures that offer many more alternatives to execute parallel applications than traditional SMP machines, this kind of Parallelism is increasingly important. And so, the need to have some set of benchmarks to evaluate it. In this paper, we motivate the need of having such a benchmarks suite, for irregular and/or recursive Task Parallelism. We present our proposal, the Barcelona OpenMP Tasks Suite (BOTS), with a set of applications exploiting regular and irregular Parallelism, based on Tasks. We present an overall evaluation of the BOTS benchmarks in an Altix system and we discuss some of the different experiments that can be done with the different compilation and runtime alternatives of the benchmarks.

15 days free trial to Access Article

Pedro Palomo - One of the best experts on this subject based on the ideXlab platform.

a process network model for reactive streaming software with deterministic Task Parallelism

Fundamental Approaches to Software Engineering, 2018

Co-Authors: Fotios Gioulekas, Peter Poplavko, Panagiotis Katsaros, Saddek Bensalem, Pedro Palomo

Abstract:

A formal semantics is introduced for a Process Network model, which combines streaming and reactive control processing with Task Parallelism properties suitable to exploit multi-cores. Applications that react to environment stimuli are implemented by communicating sporadic and periodic Tasks, programmed independently from an execution platform. Two functionally equivalent semantics are defined, one for sequential execution and one real-time. The former ensures functional determinism by implying precedence constraints between jobs (Task executions), hence, the program outputs are independent from the Task scheduling. The latter specifies concurrent execution on a real-time platform, guaranteeing all model’s constraints; it has been implemented in an executable formal specification language. The model’s implementation runs on multi-core embedded systems, and supports integration of run-time managers for shared HW/SW resources (e.g. for controlling QoS, resource interference or power consumption). Finally, a model transformation approach has been developed, which allowed to port and statically schedule a real spacecraft on-board application on an industrial multi-core platform.

15 days free trial to Access Article
FASE - A Process Network Model for Reactive Streaming Software with Deterministic Task Parallelism

Fundamental Approaches to Software Engineering, 2018

Co-Authors: Fotios Gioulekas, Peter Poplavko, Panagiotis Katsaros, Saddek Bensalem, Pedro Palomo

Abstract:

A formal semantics is introduced for a Process Network model, which combines streaming and reactive control processing with Task Parallelism properties suitable to exploit multi-cores. Applications that react to environment stimuli are implemented by communicating sporadic and periodic Tasks, programmed independently from an execution platform. Two functionally equivalent semantics are defined, one for sequential execution and one real-time. The former ensures functional determinism by implying precedence constraints between jobs (Task executions), hence, the program outputs are independent from the Task scheduling. The latter specifies concurrent execution on a real-time platform, guaranteeing all model’s constraints; it has been implemented in an executable formal specification language. The model’s implementation runs on multi-core embedded systems, and supports integration of run-time managers for shared HW/SW resources (e.g. for controlling QoS, resource interference or power consumption). Finally, a model transformation approach has been developed, which allowed to port and statically schedule a real spacecraft on-board application on an industrial multi-core platform.

15 days free trial to Access Article

Panagiotis Katsaros - One of the best experts on this subject based on the ideXlab platform.

a process network model for reactive streaming software with deterministic Task Parallelism

Fundamental Approaches to Software Engineering, 2018

Co-Authors: Fotios Gioulekas, Peter Poplavko, Panagiotis Katsaros, Saddek Bensalem, Pedro Palomo

Abstract:

A formal semantics is introduced for a Process Network model, which combines streaming and reactive control processing with Task Parallelism properties suitable to exploit multi-cores. Applications that react to environment stimuli are implemented by communicating sporadic and periodic Tasks, programmed independently from an execution platform. Two functionally equivalent semantics are defined, one for sequential execution and one real-time. The former ensures functional determinism by implying precedence constraints between jobs (Task executions), hence, the program outputs are independent from the Task scheduling. The latter specifies concurrent execution on a real-time platform, guaranteeing all model’s constraints; it has been implemented in an executable formal specification language. The model’s implementation runs on multi-core embedded systems, and supports integration of run-time managers for shared HW/SW resources (e.g. for controlling QoS, resource interference or power consumption). Finally, a model transformation approach has been developed, which allowed to port and statically schedule a real spacecraft on-board application on an industrial multi-core platform.

15 days free trial to Access Article
FASE - A Process Network Model for Reactive Streaming Software with Deterministic Task Parallelism

Fundamental Approaches to Software Engineering, 2018

Co-Authors: Fotios Gioulekas, Peter Poplavko, Panagiotis Katsaros, Saddek Bensalem, Pedro Palomo

Abstract:

A formal semantics is introduced for a Process Network model, which combines streaming and reactive control processing with Task Parallelism properties suitable to exploit multi-cores. Applications that react to environment stimuli are implemented by communicating sporadic and periodic Tasks, programmed independently from an execution platform. Two functionally equivalent semantics are defined, one for sequential execution and one real-time. The former ensures functional determinism by implying precedence constraints between jobs (Task executions), hence, the program outputs are independent from the Task scheduling. The latter specifies concurrent execution on a real-time platform, guaranteeing all model’s constraints; it has been implemented in an executable formal specification language. The model’s implementation runs on multi-core embedded systems, and supports integration of run-time managers for shared HW/SW resources (e.g. for controlling QoS, resource interference or power consumption). Finally, a model transformation approach has been developed, which allowed to port and statically schedule a real spacecraft on-board application on an industrial multi-core platform.

15 days free trial to Access Article

P. Sadayappan - One of the best experts on this subject based on the ideXlab platform.

Scioto: A Framework for Global-View Task Parallelism

2008 37th International Conference on Parallel Processing, 2008

Co-Authors: James Dinan, Sriram Krishnamoorthy, Brian D. Larkins, Jarek Nieplocha, P. Sadayappan

Abstract:

We introduce Scioto, shared collections of Task objects, a lightweight framework for providing Task management on distributed memory machines under one-sided and global-view parallel programming models. Scioto provides locality aware dynamic load balancing and interoperates with MPI, ARMCI, and global arrays. Additionally, Scioto's Task model and programming interface are compatible with many other existing parallel models including UPC, SHMEM, and CAF. Through Task Parallelism, the Scioto framework provides a solution for overcoming irregularity, load imbalance, and heterogeneity as well as dynamic mapping of computation onto emerging architectures. In this paper, we present the design and implementation of the Scioto framework and demonstrate its effectiveness on the unbalanced tree search (UTS) benchmark and two quantum chemistry codes: the closed shell self-consistent field (SCF) method and a sparse tensor contraction kernel extracted from a coupled cluster computation. We explore the efficiency and scalability of Scioto through these sample applications and demonstrate that is offers low overhead, achieves good performance on heterogeneous and multicore clusters, and scales to hundreds of processors.

15 days free trial to Access Article
ICPP - Scioto: A Framework for Global-View Task Parallelism

2008 37th International Conference on Parallel Processing, 2008

Co-Authors: James Dinan, Sriram Krishnamoorthy, Jarek Nieplocha, L.d. Brian, P. Sadayappan

Abstract:

We introduce Scioto, shared collections of Task objects, a lightweight framework for providing Task management on distributed memory machines under one-sided and global-view parallel programming models. Scioto provides locality aware dynamic load balancing and interoperates with MPI, ARMCI, and global arrays. Additionally, Scioto's Task model and programming interface are compatible with many other existing parallel models including UPC, SHMEM, and CAF. Through Task Parallelism, the Scioto framework provides a solution for overcoming irregularity, load imbalance, and heterogeneity as well as dynamic mapping of computation onto emerging architectures. In this paper, we present the design and implementation of the Scioto framework and demonstrate its effectiveness on the unbalanced tree search (UTS) benchmark and two quantum chemistry codes: the closed shell self-consistent field (SCF) method and a sparse tensor contraction kernel extracted from a coupled cluster computation. We explore the efficiency and scalability of Scioto through these sample applications and demonstrate that is offers low overhead, achieves good performance on heterogeneous and multicore clusters, and scales to hundreds of processors.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Vivek Sarkar - One of the best experts on this subject based on the ideXlab platform.

OpenSHMEM - Integrating Asynchronous Task Parallelism with OpenSHMEM

integrating asynchronous Task Parallelism with openshmem

SCnC: Efficient Unification of Streaming with Dynamic Task Parallelism

elastic Tasks unifying Task Parallelism and spmd Parallelism with an adaptive runtime

Euro-Par - Elastic Tasks: Unifying Task Parallelism and SPMD Parallelism with an Adaptive Runtime

Eduard Ayguade - One of the best experts on this subject based on the ideXlab platform.

PARSECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite

LCPC - Unrolling loops containing Task Parallelism

unrolling loops containing Task Parallelism

barcelona openmp Tasks suite a set of benchmarks targeting the exploitation of Task Parallelism in openmp

ICPP - Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP

Pedro Palomo - One of the best experts on this subject based on the ideXlab platform.

a process network model for reactive streaming software with deterministic Task Parallelism

FASE - A Process Network Model for Reactive Streaming Software with Deterministic Task Parallelism

Panagiotis Katsaros - One of the best experts on this subject based on the ideXlab platform.

a process network model for reactive streaming software with deterministic Task Parallelism

FASE - A Process Network Model for Reactive Streaming Software with Deterministic Task Parallelism

P. Sadayappan - One of the best experts on this subject based on the ideXlab platform.

Scioto: A Framework for Global-View Task Parallelism

ICPP - Scioto: A Framework for Global-View Task Parallelism

Task Parallelism

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Related terms

Vivek Sarkar - One of the best experts on this subject based on the ideXlab platform.

Eduard Ayguade - One of the best experts on this subject based on the ideXlab platform.

Pedro Palomo - One of the best experts on this subject based on the ideXlab platform.

Panagiotis Katsaros - One of the best experts on this subject based on the ideXlab platform.

P. Sadayappan - One of the best experts on this subject based on the ideXlab platform.

Related terms