Data Parallelism

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 50589 Experts worldwide ranked by ideXlab platform

P Banerjee - One of the best experts on this subject based on the ideXlab platform.

  • a framework for exploiting task and Data Parallelism on distributed memory multicomputers
    IEEE Transactions on Parallel and Distributed Systems, 1997
    Co-Authors: Shankar Ramaswamy, Sachin S Sapatnekar, P Banerjee
    Abstract:

    Distributed Memory Multicomputers (DMMs), such as the IBM SP-2, the Intel Paragon, and the Thinking Machines CM-5, offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophisticated compiler and run-time support for distributed memory machines. In this paper, we explore a new compiler optimization for regular scientific applications-the simultaneous exploitation of task and Data Parallelism. Our optimization is implemented as part of the PARADIGM HPF compiler framework we have developed. The intuitive idea behind the optimization is the use of task Parallelism to control the degree of Data Parallelism of individual tasks. The reason this provides increased performance is that Data Parallelism provides diminishing returns as the number of processors used is increased. By controlling the number of processors used for each Data parallel task in an application and by concurrently executing these tasks, we make program execution more efficient and, therefore, faster.

  • compiling matlab programs to scalapack exploiting task and Data Parallelism
    International Conference on Parallel Processing, 1996
    Co-Authors: Shankar Ramaswamy, Eugene W Hodges, P Banerjee
    Abstract:

    We suggest a new approach aimed at reducing the effort required to program distributed-memory multicomputers. The key idea in our approach is to automatically convert a program written in a library-based programming language (MATLAB) to a parallel program based on the ScaLAPACK parallel library. In the process of performing this conversion, we apply compiler optimizations that simultaneously exploit task and Data Parallelism. As our results show, our approach is feasible and practical and our optimization provides significant performance benefits.

  • simultaneous exploitation of task and Data Parallelism in regular scientific applications
    1996
    Co-Authors: P Banerjee, Shankar Ramaswamy
    Abstract:

    Distributed Memory Multicomputers (DMMs) such as the IBM SP-2, the Intel Paragon and the Thinking Machines CM-5 offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophisticated compiler and run-time support for distributed memory machines. In this thesis we explore a new compiler optimization for regular scientific applications--the simultaneous exploitation of task and Data Parallelism. Our optimization is implemented as part of the PARADIGM HPF compiler framework and as part of a MATLAB compiler framework we have developed. The intuitive idea behind the optimization is the use of task Parallelism to control the degree of Data Parallelism of individual tasks. The reason this provides increased performance is that Data Parallelism provides diminishing returns as the number of processors used is increased. By controlling the number of processors used for each Data parallel task in an application and by concurrently executing these tasks, we make program execution more efficient and therefore faster. A practical implementation of a task and Data parallel scheme of execution for an application on a distributed memory multicomputer also involves Data redistribution. This Data redistribution causes an overhead. However, as our experimental results show, this overhead is not a problem; execution of a program using task and Data Parallelism together can be significantly faster that its execution using Data Parallelism alone. This makes our proposed optimization practical and extremely useful.

  • a convex programming approach for exploiting Data and functional Parallelism on distributed memory multicomputers
    International Conference on Parallel Processing, 1994
    Co-Authors: Shankar Ramaswamy, Sachin S Sapatnekar, P Banerjee
    Abstract:

    Compilers have focused on the exploitation of one of functional or Data Parallelism in the past. The PARADIGM compiler project at the University of Illinois is among the first to incorporate techniques for simultaneous exploitation of both. The work in this paper describes the techniques used in the PARADIGM compiler and analyzes the optimality of these techniques. It is the first of its kind to use realistic cost models and includes Data transfer costs which all previous researchers have neglected. Preliminary results on the CM-5 show the efficacy of our methods and the significant advantages of using functional and Data Parallelism together for execution of real applications.

Shankar Ramaswamy - One of the best experts on this subject based on the ideXlab platform.

  • a framework for exploiting task and Data Parallelism on distributed memory multicomputers
    IEEE Transactions on Parallel and Distributed Systems, 1997
    Co-Authors: Shankar Ramaswamy, Sachin S Sapatnekar, P Banerjee
    Abstract:

    Distributed Memory Multicomputers (DMMs), such as the IBM SP-2, the Intel Paragon, and the Thinking Machines CM-5, offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophisticated compiler and run-time support for distributed memory machines. In this paper, we explore a new compiler optimization for regular scientific applications-the simultaneous exploitation of task and Data Parallelism. Our optimization is implemented as part of the PARADIGM HPF compiler framework we have developed. The intuitive idea behind the optimization is the use of task Parallelism to control the degree of Data Parallelism of individual tasks. The reason this provides increased performance is that Data Parallelism provides diminishing returns as the number of processors used is increased. By controlling the number of processors used for each Data parallel task in an application and by concurrently executing these tasks, we make program execution more efficient and, therefore, faster.

  • compiling matlab programs to scalapack exploiting task and Data Parallelism
    International Conference on Parallel Processing, 1996
    Co-Authors: Shankar Ramaswamy, Eugene W Hodges, P Banerjee
    Abstract:

    We suggest a new approach aimed at reducing the effort required to program distributed-memory multicomputers. The key idea in our approach is to automatically convert a program written in a library-based programming language (MATLAB) to a parallel program based on the ScaLAPACK parallel library. In the process of performing this conversion, we apply compiler optimizations that simultaneously exploit task and Data Parallelism. As our results show, our approach is feasible and practical and our optimization provides significant performance benefits.

  • simultaneous exploitation of task and Data Parallelism in regular scientific applications
    1996
    Co-Authors: P Banerjee, Shankar Ramaswamy
    Abstract:

    Distributed Memory Multicomputers (DMMs) such as the IBM SP-2, the Intel Paragon and the Thinking Machines CM-5 offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophisticated compiler and run-time support for distributed memory machines. In this thesis we explore a new compiler optimization for regular scientific applications--the simultaneous exploitation of task and Data Parallelism. Our optimization is implemented as part of the PARADIGM HPF compiler framework and as part of a MATLAB compiler framework we have developed. The intuitive idea behind the optimization is the use of task Parallelism to control the degree of Data Parallelism of individual tasks. The reason this provides increased performance is that Data Parallelism provides diminishing returns as the number of processors used is increased. By controlling the number of processors used for each Data parallel task in an application and by concurrently executing these tasks, we make program execution more efficient and therefore faster. A practical implementation of a task and Data parallel scheme of execution for an application on a distributed memory multicomputer also involves Data redistribution. This Data redistribution causes an overhead. However, as our experimental results show, this overhead is not a problem; execution of a program using task and Data Parallelism together can be significantly faster that its execution using Data Parallelism alone. This makes our proposed optimization practical and extremely useful.

  • a convex programming approach for exploiting Data and functional Parallelism on distributed memory multicomputers
    International Conference on Parallel Processing, 1994
    Co-Authors: Shankar Ramaswamy, Sachin S Sapatnekar, P Banerjee
    Abstract:

    Compilers have focused on the exploitation of one of functional or Data Parallelism in the past. The PARADIGM compiler project at the University of Illinois is among the first to incorporate techniques for simultaneous exploitation of both. The work in this paper describes the techniques used in the PARADIGM compiler and analyzes the optimality of these techniques. It is the first of its kind to use realistic cost models and includes Data transfer costs which all previous researchers have neglected. Preliminary results on the CM-5 show the efficacy of our methods and the significant advantages of using functional and Data Parallelism together for execution of real applications.

Jie Jiang - One of the best experts on this subject based on the ideXlab platform.

  • angel a new large scale machine learning system
    National Science Review, 2018
    Co-Authors: Jie Jiang, Jiawei Jiang, Yuhong Liu, Bin Cui
    Abstract:

    Machine Learning (ML) techniques now are ubiquitous tools to extract structural information from Data collections. With the increasing volume of Data, large-scale ML applications require an efficient implementation to accelerate the performance. Existing systems parallelize algorithms through either Data Parallelism or model Parallelism. But Data Parallelism cannot obtain good statistical efficiency due to the conflicting updates to parameters while the performance is damaged by global barriers in model parallel methods. In this paper, we propose a new system, named Angel, to facilitate the development of large-scale ML applications in production environment. By allowing concurrent updates to model across different groups and scheduling the updates in each group, Angel can achieve a good balance between hardware efficiency and statistical efficiency. Besides, Angel reduces the network latency by overlapping the parameter pulling and update computing and also utilizes the sparseness of Data to avoid the pulling of unnecessary parameters. We also enhance the usability of Angel by providing a set of efficient tools to integrate with application pipelines and provisioning efficient fault tolerance mechanisms. We conduct extensive experiments to demonstrate the superiority of Angel.

Simon Jones - One of the best experts on this subject based on the ideXlab platform.

  • harnessing the multicores nested Data Parallelism in haskell
    Asian Symposium on Programming Languages and Systems, 2008
    Co-Authors: Simon Jones
    Abstract:

    If you want to program a parallel computer, a purely functional language like Haskell is a promising starting point. Since the language is pure, it is by-default safe for parallel evaluation, whereas imperative languages are by-default unsafe. But that doesn't make it easy! Indeed it has proved quite difficult to get robust, scalable performance increases through parallel functional programming, especially as the number of processors increases. A particularly promising and well-studied approach to employing large numbers of processors is to use Data Parallelism. Blelloch's pioneering work on NESL showed that it was possible to combine a rather flexible programming model (nested Data Parallelism) with a fast, scalable execution model (flat Data Parallelism). In this talk I will describe Data Parallel Haskell, which embodies nested Data Parallelism in a modern, general-purpose language, implemented in a state-of-the-art compiler, GHC. I will focus particularly on the vectorisation transformation, which transforms nested to flat Data Parallelism, and I hope to present performance numbers.

  • harnessing the multicores nested Data Parallelism in haskell
    Foundations of Software Technology and Theoretical Computer Science, 2008
    Co-Authors: Simon Jones, Roman Leshchinskiy, Gabriele Keller, Manuel M T Chakravarty
    Abstract:

    If you want to program a parallel computer, a purely functional language like Haskell is a promising starting point. Since the language is pure, it is by-default safe for parallel evaluation, whereas imperative languages are by-default unsafe. But that doesn\'t make it easy! Indeed it has proved quite difficult to get robust, scalable performance increases through parallel functional programming, especially as the number of processors increases. A particularly promising and well-studied approach to employing large numbers of processors is Data Parallelism. Blelloch\'s pioneering work on NESL showed that it was possible to combine a rather flexible programming model (nested Data Parallelism) with a fast, scalable execution model (flat Data Parallelism). In this paper we describe Data Parallel Haskell, which embodies nested Data Parallelism in a modern, general-purpose language, implemented in a state-of-the-art compiler, GHC. We focus particularly on the vectorisation transformation, which transforms nested to flat Data Parallelism.

Bin Cui - One of the best experts on this subject based on the ideXlab platform.

  • angel a new large scale machine learning system
    National Science Review, 2018
    Co-Authors: Jie Jiang, Jiawei Jiang, Yuhong Liu, Bin Cui
    Abstract:

    Machine Learning (ML) techniques now are ubiquitous tools to extract structural information from Data collections. With the increasing volume of Data, large-scale ML applications require an efficient implementation to accelerate the performance. Existing systems parallelize algorithms through either Data Parallelism or model Parallelism. But Data Parallelism cannot obtain good statistical efficiency due to the conflicting updates to parameters while the performance is damaged by global barriers in model parallel methods. In this paper, we propose a new system, named Angel, to facilitate the development of large-scale ML applications in production environment. By allowing concurrent updates to model across different groups and scheduling the updates in each group, Angel can achieve a good balance between hardware efficiency and statistical efficiency. Besides, Angel reduces the network latency by overlapping the parameter pulling and update computing and also utilizes the sparseness of Data to avoid the pulling of unnecessary parameters. We also enhance the usability of Angel by providing a set of efficient tools to integrate with application pipelines and provisioning efficient fault tolerance mechanisms. We conduct extensive experiments to demonstrate the superiority of Angel.