Parallel Program

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 97611 Experts worldwide ranked by ideXlab platform

Jeffery Von Ronne - One of the best experts on this subject based on the ideXlab platform.

  • a task uncoordinated distributed dataflow model for scalable high performance Parallel Program execution
    Parallel Computing, 2016
    Co-Authors: Lucas A Wilson, Jeffery Von Ronne
    Abstract:

    We describe a novel model for executing distributed memory Parallel Programs using uncoordinated tasks.We describe several off-line optimizations for the proposed model.We examine the effects of these optimizations on modern processors with wider vector units.Increasing levels of task coalescence can improve throughput and increase performance.Increases in performance are observed in both single node and multi node experiments. We propose a distributed dataflow execution model which utilizes a distributed dictionary for data memoization, allowing each Parallel task to schedule instructions without direct inter-task coordination. We provide a description of the proposed model, including autonomous dataflow task selection. We also describe a set of optimization strategies which improve overall throughput of stencil Programs executed using this model on modern multi-core and vectorized architectures.

  • a distributed dataflow model for task uncoordinated Parallel Program execution
    International Conference on Parallel Processing, 2014
    Co-Authors: Lucas A Wilson, Jeffery Von Ronne
    Abstract:

    High Performance Computing (HPC) systems now consist of many thousands of individual servers. While relatively scalable and cost effective, these systems suffer from a complexity of scale that will not improve with increasing machine size. It will become increasingly difficult, if not impossible, for HPC systems to maintain node availability long enough for any type of worthwhile scientific calculations to be performed. Existing execution and Programming models, which are dependent on guaranteed hardware reliability, are not well suited to future distributed memory Parallel systems where hardware reliability cannot be guaranteed. We propose a distributed dataflow execution model which utilizes a distributed dictionary for data memoization, allowing each Parallel task to schedule instructions without direct inter-task coordination. We provide a description of the proposed execution model, including Program formulation and autonomous dataflow task selection. Experiments performed demonstrate the proposed model's ability to automatically distribute work across tasks, as well as the proposed model's ability to scale in both shared memory and distributed memory.

John L Hennessy - One of the best experts on this subject based on the ideXlab platform.

  • mtool an integrated system for performance debugging shared memory multiprocessor applications
    IEEE Transactions on Parallel and Distributed Systems, 1993
    Co-Authors: Aaron J Goldberg, John L Hennessy
    Abstract:

    The authors describe Mtool, a software tool for analyzing performance losses in shared memory Parallel Programs. Mtool augments a Program with low overhead instrumentation which perturbs the Program's execution as little as possible while generating enough information to isolate memory and synchronization bottlenecks. After running the instrumented version of the Parallel Program, the Programmer can use Mtool's window-based user interface to view compute time, memory, and synchronization objects. The authors describe Mtool's low overhead instrumentation methods, memory bottleneck detection technique, and attention focusing mechanisms, contrast Mtool with other approaches, and offer a case study to demonstrate its effectiveness. >

A N Salnikov - One of the best experts on this subject based on the ideXlab platform.

  • parus a Parallel Programming framework for heterogeneous multiprocessor systems
    Lecture Notes in Computer Science, 2006
    Co-Authors: A N Salnikov
    Abstract:

    PARUS is a Parallel Programing framework that allows building Parallel Programs in data flow graph notation. The data flow graph is created by developer either manually or automatically with the help of a script. The graph is then converted to C++/MPI source code and linked with the PARUS runtime system. The next step is the Parallel Program execution on a cluster or multiprocessor system. PARUS also implements some approaches for load balancing on heterogeneous multiprocessor system. There is a set of MPI tests that allow developer to estimate the information about communications in a multiprocessor or cluster.

Lucas A Wilson - One of the best experts on this subject based on the ideXlab platform.

  • a task uncoordinated distributed dataflow model for scalable high performance Parallel Program execution
    Parallel Computing, 2016
    Co-Authors: Lucas A Wilson, Jeffery Von Ronne
    Abstract:

    We describe a novel model for executing distributed memory Parallel Programs using uncoordinated tasks.We describe several off-line optimizations for the proposed model.We examine the effects of these optimizations on modern processors with wider vector units.Increasing levels of task coalescence can improve throughput and increase performance.Increases in performance are observed in both single node and multi node experiments. We propose a distributed dataflow execution model which utilizes a distributed dictionary for data memoization, allowing each Parallel task to schedule instructions without direct inter-task coordination. We provide a description of the proposed model, including autonomous dataflow task selection. We also describe a set of optimization strategies which improve overall throughput of stencil Programs executed using this model on modern multi-core and vectorized architectures.

  • a distributed dataflow model for task uncoordinated Parallel Program execution
    International Conference on Parallel Processing, 2014
    Co-Authors: Lucas A Wilson, Jeffery Von Ronne
    Abstract:

    High Performance Computing (HPC) systems now consist of many thousands of individual servers. While relatively scalable and cost effective, these systems suffer from a complexity of scale that will not improve with increasing machine size. It will become increasingly difficult, if not impossible, for HPC systems to maintain node availability long enough for any type of worthwhile scientific calculations to be performed. Existing execution and Programming models, which are dependent on guaranteed hardware reliability, are not well suited to future distributed memory Parallel systems where hardware reliability cannot be guaranteed. We propose a distributed dataflow execution model which utilizes a distributed dictionary for data memoization, allowing each Parallel task to schedule instructions without direct inter-task coordination. We provide a description of the proposed execution model, including Program formulation and autonomous dataflow task selection. Experiments performed demonstrate the proposed model's ability to automatically distribute work across tasks, as well as the proposed model's ability to scale in both shared memory and distributed memory.

Aaron J Goldberg - One of the best experts on this subject based on the ideXlab platform.

  • mtool an integrated system for performance debugging shared memory multiprocessor applications
    IEEE Transactions on Parallel and Distributed Systems, 1993
    Co-Authors: Aaron J Goldberg, John L Hennessy
    Abstract:

    The authors describe Mtool, a software tool for analyzing performance losses in shared memory Parallel Programs. Mtool augments a Program with low overhead instrumentation which perturbs the Program's execution as little as possible while generating enough information to isolate memory and synchronization bottlenecks. After running the instrumented version of the Parallel Program, the Programmer can use Mtool's window-based user interface to view compute time, memory, and synchronization objects. The authors describe Mtool's low overhead instrumentation methods, memory bottleneck detection technique, and attention focusing mechanisms, contrast Mtool with other approaches, and offer a case study to demonstrate its effectiveness. >