Task Parallelism

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 15516 Experts worldwide ranked by ideXlab platform

Vivek Sarkar - One of the best experts on this subject based on the ideXlab platform.

  • OpenSHMEM - Integrating Asynchronous Task Parallelism with OpenSHMEM
    OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016
    Co-Authors: Max Grossman, Vivek Kumar, Zoran Budimlic, Vivek Sarkar
    Abstract:

    Partitioned Global Address Space (PGAS) programming models combine shared and distributed memory features, and provide a foundation for high-productivity parallel programming using lightweight one-sided communications. The OpenSHMEM programming interface has recently begun gaining popularity as a lightweight library-based approach for developing PGAS applications, in part through its use of a symmetric heap to realize more efficient implementations of global pointers than in other PGAS systems. However, current approaches to hybrid inter-node and intra-node parallel programming in OpenSHMEM rely on the use of multithreaded programming models (e.g., pthreads, OpenMP) that harness intra-node Parallelism but are opaque to the OpenSHMEM runtime. This OpenSHMEM+X approach can encounter performance challenges such as bottlenecks on shared resources, long pause times due to load imbalances, and poor data locality. Furthermore, OpenSHMEM+X requires the expertise of hero-level programmers, compared to the use of just OpenSHMEM. All of these are hard challenges to mitigate with incremental changes. This situation will worsen as computing nodes increase their use of accelerators and heterogeneous memories.

  • integrating asynchronous Task Parallelism with openshmem
    Workshop on OpenSHMEM and Related Technologies, 2016
    Co-Authors: Max Grossman, Vivek Kumar, Zoran Budimlic, Vivek Sarkar
    Abstract:

    Partitioned Global Address Space (PGAS) programming models combine shared and distributed memory features, and provide a foundation for high-productivity parallel programming using lightweight one-sided communications. The OpenSHMEM programming interface has recently begun gaining popularity as a lightweight library-based approach for developing PGAS applications, in part through its use of a symmetric heap to realize more efficient implementations of global pointers than in other PGAS systems. However, current approaches to hybrid inter-node and intra-node parallel programming in OpenSHMEM rely on the use of multithreaded programming models (e.g., pthreads, OpenMP) that harness intra-node Parallelism but are opaque to the OpenSHMEM runtime. This OpenSHMEM+X approach can encounter performance challenges such as bottlenecks on shared resources, long pause times due to load imbalances, and poor data locality. Furthermore, OpenSHMEM+X requires the expertise of hero-level programmers, compared to the use of just OpenSHMEM. All of these are hard challenges to mitigate with incremental changes. This situation will worsen as computing nodes increase their use of accelerators and heterogeneous memories.

  • SCnC: Efficient Unification of Streaming with Dynamic Task Parallelism
    International Journal of Parallel Programming, 2016
    Co-Authors: Dragoş Sbîrlea, Jun Shirako, Ryan Newton, Vivek Sarkar
    Abstract:

    Stream processing is a special form of the dataflow execution model that offers extensive opportunities for optimization and automatic parallelization. To take full advantage of the paradigm programmers are typically required to learn a new language and re-implement their applications. This work shows that it is possible to exploit streaming as a safe and automatic optimization of a more general dataflow-based model—one in which computation kernels are written in standard, general-purpose languages and organized as a coordination graph. We propose streaming concurrent collections (SCnC), a streaming system that can efficiently run a subset of programs supported by concurrent collections (CnC). CnC is a general purpose parallel programming paradigm that integrates Task Parallelism and dataflow computing. The proposed streaming support allows application developers to reason about their program as a general dataflow graph, while benefiting from the performance and tight memory footprint of stream Parallelism when their program satisfies streaming constraints. In this paper, we formally define the application requirements for using SCnC, and outline a static decision procedure for identifying and processing eligible SCnC subgraphs. We present initial results showing that transitioning from general CnC to SCnC leads to a throughput increase of up to 40 $$\times $$ × for certain benchmarks, and also enables programs with large data sizes to execute in available memory for cases where CnC execution may run out of memory.

  • elastic Tasks unifying Task Parallelism and spmd Parallelism with an adaptive runtime
    European Conference on Parallel Processing, 2015
    Co-Authors: Alina Sbirlea, Kunal Agrawal, Vivek Sarkar
    Abstract:

    In this paper, we introduce elastic Tasks, a new high-level parallel programming primitive that can be used to unify Task Parallelism and SPMD Parallelism in a common adaptive scheduling framework. Elastic Tasks are internally parallel Tasks and can run on a single worker or expand to take over multiple workers. An elastic Task can be an ordinary Task or an SPMD region that must be executed by one or more workers simultaneously, in a tightly coupled manner.

  • Euro-Par - Elastic Tasks: Unifying Task Parallelism and SPMD Parallelism with an Adaptive Runtime
    Lecture Notes in Computer Science, 2015
    Co-Authors: Alina Sbirlea, Kunal Agrawal, Vivek Sarkar
    Abstract:

    In this paper, we introduce elastic Tasks, a new high-level parallel programming primitive that can be used to unify Task Parallelism and SPMD Parallelism in a common adaptive scheduling framework. Elastic Tasks are internally parallel Tasks and can run on a single worker or expand to take over multiple workers. An elastic Task can be an ordinary Task or an SPMD region that must be executed by one or more workers simultaneously, in a tightly coupled manner.

Eduard Ayguade - One of the best experts on this subject based on the ideXlab platform.

  • PARSECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite
    ACM Transactions on Architecture and Code Optimization, 2016
    Co-Authors: Dimitrios Chasapis, Eduard Ayguade, Marc Casas, Miquel Moreto, Raul Vidal, Jesús Labarta, Mateo Valero
    Abstract:

    In this work, we show how parallel applications can be implemented efficiently using Task Parallelism. We also evaluate the benefits of such parallel paradigm with respect to other approaches. We use the PARSEC benchmark suite as our test bed, which includes applications representative of a wide range of domains from HPC to desktop and server applications. We adopt different parallelization techniques, tailored to the needs of each application, to fully exploit the Task-based model. Our evaluation shows that Task Parallelism achieves better performance than thread-based parallelization models, such as Pthreads. Our experimental results show that we can obtain scalability improvements up to 42p on a 16-core system and code size reductions up to 81p. Such reductions are achieved by removing from the source code application specific schedulers or thread pooling systems and transferring these responsibilities to the runtime system software.

  • LCPC - Unrolling loops containing Task Parallelism
    Languages and Compilers for Parallel Computing, 2010
    Co-Authors: Roger Ferrer, Alejandro Duran, Xavier Martorell, Eduard Ayguade
    Abstract:

    Classic loop unrolling allows to increase the performance of sequential loops by reducing the overheads of the non-computational parts of the loop. Unfortunately, when the loop contains Parallelism inside most compilers will ignore it or perform a naive transformation. We propose to extend the semantics of the loop unrolling transformation to cover loops that contain Task Parallelism. In these cases, the transformation will try to aggregate the multiple Tasks that appear after a classic unrolling phase to reduce the overheads per iteration. We present an implementation of such extended loop unrolling for OpenMP Tasks with two phases: a classical unroll followed by a Task aggregation phase. Our aggregation technique covers the special cases where Task Parallelism appears inside branches or where the loop is uncountable. Our experimental results show that using this extended unroll allows loops with fine-grained Tasks to reduce the overheads associated with Task creation and obtain a much better scaling.

  • unrolling loops containing Task Parallelism
    Languages and Compilers for Parallel Computing, 2009
    Co-Authors: Roger Ferrer, Alejandro Duran, Xavier Martorell, Eduard Ayguade
    Abstract:

    Classic loop unrolling allows to increase the performance of sequential loops by reducing the overheads of the non-computational parts of the loop. Unfortunately, when the loop contains Parallelism inside most compilers will ignore it or perform a naive transformation. We propose to extend the semantics of the loop unrolling transformation to cover loops that contain Task Parallelism. In these cases, the transformation will try to aggregate the multiple Tasks that appear after a classic unrolling phase to reduce the overheads per iteration. We present an implementation of such extended loop unrolling for OpenMP Tasks with two phases: a classical unroll followed by a Task aggregation phase. Our aggregation technique covers the special cases where Task Parallelism appears inside branches or where the loop is uncountable. Our experimental results show that using this extended unroll allows loops with fine-grained Tasks to reduce the overheads associated with Task creation and obtain a much better scaling.

  • barcelona openmp Tasks suite a set of benchmarks targeting the exploitation of Task Parallelism in openmp
    International Conference on Parallel Processing, 2009
    Co-Authors: Alejandro Duran, Roger Ferrer, Xavier Martorell, Xavier Teruel, Eduard Ayguade
    Abstract:

    Traditional parallel applications have exploited regular Parallelism, based on parallel loops. Only a few applications exploit sections Parallelism. With the release of the new OpenMP specification (3.0), this programming model supports Tasking. Parallel Tasks allow the exploitation of irregular Parallelism, but there is a lack of benchmarks exploiting Tasks in OpenMP. With the current (and projected) multicore architectures that offer many more alternatives to execute parallel applications than traditional SMP machines, this kind of Parallelism is increasingly important. And so, the need to have some set of benchmarks to evaluate it. In this paper, we motivate the need of having such a benchmarks suite, for irregular and/or recursive Task Parallelism. We present our proposal, the Barcelona OpenMP Tasks Suite (BOTS), with a set of applications exploiting regular and irregular Parallelism, based on Tasks. We present an overall evaluation of the BOTS benchmarks in an Altix system and we discuss some of the different experiments that can be done with the different compilation and runtime alternatives of the benchmarks.

  • ICPP - Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP
    2009 International Conference on Parallel Processing, 2009
    Co-Authors: Alejandro Duran, Roger Ferrer, Xavier Martorell, Xavier Teruel, Eduard Ayguade
    Abstract:

    Traditional parallel applications have exploited regular Parallelism, based on parallel loops. Only a few applications exploit sections Parallelism. With the release of the new OpenMP specification (3.0), this programming model supports Tasking. Parallel Tasks allow the exploitation of irregular Parallelism, but there is a lack of benchmarks exploiting Tasks in OpenMP. With the current (and projected) multicore architectures that offer many more alternatives to execute parallel applications than traditional SMP machines, this kind of Parallelism is increasingly important. And so, the need to have some set of benchmarks to evaluate it. In this paper, we motivate the need of having such a benchmarks suite, for irregular and/or recursive Task Parallelism. We present our proposal, the Barcelona OpenMP Tasks Suite (BOTS), with a set of applications exploiting regular and irregular Parallelism, based on Tasks. We present an overall evaluation of the BOTS benchmarks in an Altix system and we discuss some of the different experiments that can be done with the different compilation and runtime alternatives of the benchmarks.

Pedro Palomo - One of the best experts on this subject based on the ideXlab platform.

  • a process network model for reactive streaming software with deterministic Task Parallelism
    Fundamental Approaches to Software Engineering, 2018
    Co-Authors: Fotios Gioulekas, Peter Poplavko, Panagiotis Katsaros, Saddek Bensalem, Pedro Palomo
    Abstract:

    A formal semantics is introduced for a Process Network model, which combines streaming and reactive control processing with Task Parallelism properties suitable to exploit multi-cores. Applications that react to environment stimuli are implemented by communicating sporadic and periodic Tasks, programmed independently from an execution platform. Two functionally equivalent semantics are defined, one for sequential execution and one real-time. The former ensures functional determinism by implying precedence constraints between jobs (Task executions), hence, the program outputs are independent from the Task scheduling. The latter specifies concurrent execution on a real-time platform, guaranteeing all model’s constraints; it has been implemented in an executable formal specification language. The model’s implementation runs on multi-core embedded systems, and supports integration of run-time managers for shared HW/SW resources (e.g. for controlling QoS, resource interference or power consumption). Finally, a model transformation approach has been developed, which allowed to port and statically schedule a real spacecraft on-board application on an industrial multi-core platform.

  • FASE - A Process Network Model for Reactive Streaming Software with Deterministic Task Parallelism
    Fundamental Approaches to Software Engineering, 2018
    Co-Authors: Fotios Gioulekas, Peter Poplavko, Panagiotis Katsaros, Saddek Bensalem, Pedro Palomo
    Abstract:

    A formal semantics is introduced for a Process Network model, which combines streaming and reactive control processing with Task Parallelism properties suitable to exploit multi-cores. Applications that react to environment stimuli are implemented by communicating sporadic and periodic Tasks, programmed independently from an execution platform. Two functionally equivalent semantics are defined, one for sequential execution and one real-time. The former ensures functional determinism by implying precedence constraints between jobs (Task executions), hence, the program outputs are independent from the Task scheduling. The latter specifies concurrent execution on a real-time platform, guaranteeing all model’s constraints; it has been implemented in an executable formal specification language. The model’s implementation runs on multi-core embedded systems, and supports integration of run-time managers for shared HW/SW resources (e.g. for controlling QoS, resource interference or power consumption). Finally, a model transformation approach has been developed, which allowed to port and statically schedule a real spacecraft on-board application on an industrial multi-core platform.

Panagiotis Katsaros - One of the best experts on this subject based on the ideXlab platform.

  • a process network model for reactive streaming software with deterministic Task Parallelism
    Fundamental Approaches to Software Engineering, 2018
    Co-Authors: Fotios Gioulekas, Peter Poplavko, Panagiotis Katsaros, Saddek Bensalem, Pedro Palomo
    Abstract:

    A formal semantics is introduced for a Process Network model, which combines streaming and reactive control processing with Task Parallelism properties suitable to exploit multi-cores. Applications that react to environment stimuli are implemented by communicating sporadic and periodic Tasks, programmed independently from an execution platform. Two functionally equivalent semantics are defined, one for sequential execution and one real-time. The former ensures functional determinism by implying precedence constraints between jobs (Task executions), hence, the program outputs are independent from the Task scheduling. The latter specifies concurrent execution on a real-time platform, guaranteeing all model’s constraints; it has been implemented in an executable formal specification language. The model’s implementation runs on multi-core embedded systems, and supports integration of run-time managers for shared HW/SW resources (e.g. for controlling QoS, resource interference or power consumption). Finally, a model transformation approach has been developed, which allowed to port and statically schedule a real spacecraft on-board application on an industrial multi-core platform.

  • FASE - A Process Network Model for Reactive Streaming Software with Deterministic Task Parallelism
    Fundamental Approaches to Software Engineering, 2018
    Co-Authors: Fotios Gioulekas, Peter Poplavko, Panagiotis Katsaros, Saddek Bensalem, Pedro Palomo
    Abstract:

    A formal semantics is introduced for a Process Network model, which combines streaming and reactive control processing with Task Parallelism properties suitable to exploit multi-cores. Applications that react to environment stimuli are implemented by communicating sporadic and periodic Tasks, programmed independently from an execution platform. Two functionally equivalent semantics are defined, one for sequential execution and one real-time. The former ensures functional determinism by implying precedence constraints between jobs (Task executions), hence, the program outputs are independent from the Task scheduling. The latter specifies concurrent execution on a real-time platform, guaranteeing all model’s constraints; it has been implemented in an executable formal specification language. The model’s implementation runs on multi-core embedded systems, and supports integration of run-time managers for shared HW/SW resources (e.g. for controlling QoS, resource interference or power consumption). Finally, a model transformation approach has been developed, which allowed to port and statically schedule a real spacecraft on-board application on an industrial multi-core platform.

P. Sadayappan - One of the best experts on this subject based on the ideXlab platform.

  • Scioto: A Framework for Global-View Task Parallelism
    2008 37th International Conference on Parallel Processing, 2008
    Co-Authors: James Dinan, Sriram Krishnamoorthy, Brian D. Larkins, Jarek Nieplocha, P. Sadayappan
    Abstract:

    We introduce Scioto, shared collections of Task objects, a lightweight framework for providing Task management on distributed memory machines under one-sided and global-view parallel programming models. Scioto provides locality aware dynamic load balancing and interoperates with MPI, ARMCI, and global arrays. Additionally, Scioto's Task model and programming interface are compatible with many other existing parallel models including UPC, SHMEM, and CAF. Through Task Parallelism, the Scioto framework provides a solution for overcoming irregularity, load imbalance, and heterogeneity as well as dynamic mapping of computation onto emerging architectures. In this paper, we present the design and implementation of the Scioto framework and demonstrate its effectiveness on the unbalanced tree search (UTS) benchmark and two quantum chemistry codes: the closed shell self-consistent field (SCF) method and a sparse tensor contraction kernel extracted from a coupled cluster computation. We explore the efficiency and scalability of Scioto through these sample applications and demonstrate that is offers low overhead, achieves good performance on heterogeneous and multicore clusters, and scales to hundreds of processors.

  • ICPP - Scioto: A Framework for Global-View Task Parallelism
    2008 37th International Conference on Parallel Processing, 2008
    Co-Authors: James Dinan, Sriram Krishnamoorthy, Jarek Nieplocha, L.d. Brian, P. Sadayappan
    Abstract:

    We introduce Scioto, shared collections of Task objects, a lightweight framework for providing Task management on distributed memory machines under one-sided and global-view parallel programming models. Scioto provides locality aware dynamic load balancing and interoperates with MPI, ARMCI, and global arrays. Additionally, Scioto's Task model and programming interface are compatible with many other existing parallel models including UPC, SHMEM, and CAF. Through Task Parallelism, the Scioto framework provides a solution for overcoming irregularity, load imbalance, and heterogeneity as well as dynamic mapping of computation onto emerging architectures. In this paper, we present the design and implementation of the Scioto framework and demonstrate its effectiveness on the unbalanced tree search (UTS) benchmark and two quantum chemistry codes: the closed shell self-consistent field (SCF) method and a sparse tensor contraction kernel extracted from a coupled cluster computation. We explore the efficiency and scalability of Scioto through these sample applications and demonstrate that is offers low overhead, achieves good performance on heterogeneous and multicore clusters, and scales to hundreds of processors.