Implicit Synchronization

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 1707 Experts worldwide ranked by ideXlab platform

Polyvios Pratikakis - One of the best experts on this subject based on the ideXlab platform.

  • Hierarchical Parallel Dynamic Dependence Analysis for Recursively Task-Parallel Programs
    2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016
    Co-Authors: Nikolaos Papakonstantinou, Foivos S. Zakkak, Polyvios Pratikakis
    Abstract:

    This work presents a hierarchical, parallel, dynamic dependence analysis for inferring run-time dependencies between recursively parallel tasks in the OmpSs programming model. To evaluate the dependence analysis we implement PARTEE, a scalable runtime system that supports Implicit Synchronization between nested parallel tasks. We evaluate the performance of the resulting runtime system and compare it to Nanos++, the state of the art OmpSs implementation, and Cilk, a high performance task-parallel runtime system without Implicit task Synchronization. We find that i) PARTEE is able to handle more fine grained tasks than Nanos++, ii) PARTEE's performance is comparable to that of Cilk, iii) in cases where task dependencies are irregular, PARTEE outperforms Cilk by up to 103%.

  • inference and declaration of independence in task parallel programs
    APPT 2013 Revised Selected Papers of the 10th International Symposium on Advanced Parallel Processing Technologies - Volume 8299, 2013
    Co-Authors: Foivos S. Zakkak, Polyvios Pratikakis, Dimitrios Chasapis, Angelos Bilas, Dimitrios S Nikolopoulos
    Abstract:

    The inherent difficulty of thread-based shared-memory programming has recently motivated research in high-level, task-parallel programming models. Recent advances of Task-Parallel models add Implicit Synchronization, where the system automatically detects and satisfies data dependencies among spawned tasks. However, dynamic dependence analysis incurs significant runtime overheads, because the runtime must track task resources and use this information to schedule tasks while avoiding conflicts and races. We present SCOOP, a compiler that effectively integrates static and dynamic analysis in code generation. SCOOP combines context-sensitive points-to, control-flow, escape, and effect analyses to remove redundant dependence checks at runtime. Our static analysis can work in combination with existing dynamic analyses and task-parallel runtimes that use annotations to specify tasks and their memory footprints. We use our static dependence analysis to detect non-conflicting tasks and an existing dynamic analysis to handle the remaining dependencies. We evaluate the resulting hybrid dependence analysis on a set of task-parallel programs.

  • inference and declaration of independence impact on deterministic task parallelism
    International Conference on Parallel Architectures and Compilation Techniques, 2012
    Co-Authors: Foivos S. Zakkak, Polyvios Pratikakis, Dimitrios Chasapis, Angelos Bilas, Dimitrios S Nikolopoulos
    Abstract:

    We present a set of static techniques that reduce runtime overheads in task-parallel programs with Implicit Synchronization. We use a static dependence analysis to detect non-conflicting tasks and remove unnecessary runtime checks. We further reduce overheads by statically optimizing task creation and management of runtime metadata. We implemented these optimizations in SCOOP, a source-to-source compiler for such a programming model and runtime system. We evaluate SCOOP on 10 representative benchmarks and show that our approach can improve performance by 12% on average.

Foivos S. Zakkak - One of the best experts on this subject based on the ideXlab platform.

  • Hierarchical Parallel Dynamic Dependence Analysis for Recursively Task-Parallel Programs
    2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016
    Co-Authors: Nikolaos Papakonstantinou, Foivos S. Zakkak, Polyvios Pratikakis
    Abstract:

    This work presents a hierarchical, parallel, dynamic dependence analysis for inferring run-time dependencies between recursively parallel tasks in the OmpSs programming model. To evaluate the dependence analysis we implement PARTEE, a scalable runtime system that supports Implicit Synchronization between nested parallel tasks. We evaluate the performance of the resulting runtime system and compare it to Nanos++, the state of the art OmpSs implementation, and Cilk, a high performance task-parallel runtime system without Implicit task Synchronization. We find that i) PARTEE is able to handle more fine grained tasks than Nanos++, ii) PARTEE's performance is comparable to that of Cilk, iii) in cases where task dependencies are irregular, PARTEE outperforms Cilk by up to 103%.

  • inference and declaration of independence in task parallel programs
    APPT 2013 Revised Selected Papers of the 10th International Symposium on Advanced Parallel Processing Technologies - Volume 8299, 2013
    Co-Authors: Foivos S. Zakkak, Polyvios Pratikakis, Dimitrios Chasapis, Angelos Bilas, Dimitrios S Nikolopoulos
    Abstract:

    The inherent difficulty of thread-based shared-memory programming has recently motivated research in high-level, task-parallel programming models. Recent advances of Task-Parallel models add Implicit Synchronization, where the system automatically detects and satisfies data dependencies among spawned tasks. However, dynamic dependence analysis incurs significant runtime overheads, because the runtime must track task resources and use this information to schedule tasks while avoiding conflicts and races. We present SCOOP, a compiler that effectively integrates static and dynamic analysis in code generation. SCOOP combines context-sensitive points-to, control-flow, escape, and effect analyses to remove redundant dependence checks at runtime. Our static analysis can work in combination with existing dynamic analyses and task-parallel runtimes that use annotations to specify tasks and their memory footprints. We use our static dependence analysis to detect non-conflicting tasks and an existing dynamic analysis to handle the remaining dependencies. We evaluate the resulting hybrid dependence analysis on a set of task-parallel programs.

  • inference and declaration of independence impact on deterministic task parallelism
    International Conference on Parallel Architectures and Compilation Techniques, 2012
    Co-Authors: Foivos S. Zakkak, Polyvios Pratikakis, Dimitrios Chasapis, Angelos Bilas, Dimitrios S Nikolopoulos
    Abstract:

    We present a set of static techniques that reduce runtime overheads in task-parallel programs with Implicit Synchronization. We use a static dependence analysis to detect non-conflicting tasks and remove unnecessary runtime checks. We further reduce overheads by statically optimizing task creation and management of runtime metadata. We implemented these optimizations in SCOOP, a source-to-source compiler for such a programming model and runtime system. We evaluate SCOOP on 10 representative benchmarks and show that our approach can improve performance by 12% on average.

Dimitrios S Nikolopoulos - One of the best experts on this subject based on the ideXlab platform.

  • inference and declaration of independence in task parallel programs
    APPT 2013 Revised Selected Papers of the 10th International Symposium on Advanced Parallel Processing Technologies - Volume 8299, 2013
    Co-Authors: Foivos S. Zakkak, Polyvios Pratikakis, Dimitrios Chasapis, Angelos Bilas, Dimitrios S Nikolopoulos
    Abstract:

    The inherent difficulty of thread-based shared-memory programming has recently motivated research in high-level, task-parallel programming models. Recent advances of Task-Parallel models add Implicit Synchronization, where the system automatically detects and satisfies data dependencies among spawned tasks. However, dynamic dependence analysis incurs significant runtime overheads, because the runtime must track task resources and use this information to schedule tasks while avoiding conflicts and races. We present SCOOP, a compiler that effectively integrates static and dynamic analysis in code generation. SCOOP combines context-sensitive points-to, control-flow, escape, and effect analyses to remove redundant dependence checks at runtime. Our static analysis can work in combination with existing dynamic analyses and task-parallel runtimes that use annotations to specify tasks and their memory footprints. We use our static dependence analysis to detect non-conflicting tasks and an existing dynamic analysis to handle the remaining dependencies. We evaluate the resulting hybrid dependence analysis on a set of task-parallel programs.

  • inference and declaration of independence impact on deterministic task parallelism
    International Conference on Parallel Architectures and Compilation Techniques, 2012
    Co-Authors: Foivos S. Zakkak, Polyvios Pratikakis, Dimitrios Chasapis, Angelos Bilas, Dimitrios S Nikolopoulos
    Abstract:

    We present a set of static techniques that reduce runtime overheads in task-parallel programs with Implicit Synchronization. We use a static dependence analysis to detect non-conflicting tasks and remove unnecessary runtime checks. We further reduce overheads by statically optimizing task creation and management of runtime metadata. We implemented these optimizations in SCOOP, a source-to-source compiler for such a programming model and runtime system. We evaluate SCOOP on 10 representative benchmarks and show that our approach can improve performance by 12% on average.

Abraham Ajith - One of the best experts on this subject based on the ideXlab platform.

  • CHAOS : A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi
    'Springer Science and Business Media LLC', 2019
    Co-Authors: Viebke Andre, Memeti Suejb, Pllana Sabri, Abraham Ajith
    Abstract:

    Deep learning is an important component of big-data analytic tools and intelligent applications, such as, self-driving cars, computer vision, speech recognition, or precision medicine. However, the training process is computationally intensive, and often requires a large amount of time if performed sequentially. Modern parallel computing systems provide the capability to reduce the required training time of deep neural networks.In this paper, we present our parallelization scheme for training convolutional neural networks (CNN) named Controlled Hogwild with Arbitrary Order of Synchronization (CHAOS). Major features of CHAOS include the support for thread and vector parallelism, non-instant updates of weight parameters during back-propagation without a significant delay, and Implicit Synchronization in arbitrary order. CHAOS is tailored for parallel computing systems that are accelerated with the Intel Xeon Phi. We evaluate our parallelization approach empirically using measurement techniques and performance modeling for various numbers of threads and CNN architectures. Experimental results for the MNIST dataset of handwritten digits using the total number of threads on the Xeon Phi show speedups of up to 103x compared to the execution on one thread of the Xeon Phi, 14x compared to the sequential execution on Intel Xeon E5, and 58x compared to the sequential execution on Intel Core i5

Ajith Abraham - One of the best experts on this subject based on the ideXlab platform.

  • CHAOS: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi
    The Journal of Supercomputing, 2019
    Co-Authors: André Viebke, Suejb Memeti, Sabri Pllana, Ajith Abraham
    Abstract:

    Deep learning is an important component of Big Data analytic tools and intelligent applications, such as self-driving cars, computer vision, speech recognition, or precision medicine. However, the training process is computationally intensive and often requires a large amount of time if performed sequentially. Modern parallel computing systems provide the capability to reduce the required training time of deep neural networks. In this paper, we present our parallelization scheme for training convolutional neural networks (CNN) named Controlled Hogwild with Arbitrary Order of Synchronization (CHAOS). Major features of CHAOS include the support for thread and vector parallelism, non-instant updates of weight parameters during back-propagation without a significant delay, and Implicit Synchronization in arbitrary order. CHAOS is tailored for parallel computing systems that are accelerated with the Intel Xeon Phi. We evaluate our parallelization approach empirically using measurement techniques and performance modeling for various numbers of threads and CNN architectures. Experimental results for the MNIST dataset of handwritten digits using the total number of threads on the Xeon Phi show speedups of up to $$103\times $$103× compared to the execution on one thread of the Xeon Phi, $$14\times $$14× compared to the sequential execution on Intel Xeon E5, and $$58\times $$58× compared to the sequential execution on Intel Core i5.