Runtime Library

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 912 Experts worldwide ranked by ideXlab platform

Panagiotis E. Hadjidoukas - One of the best experts on this subject based on the ideXlab platform.

  • a Runtime Library for platform independent task parallelism
    Parallel Distributed and Network-Based Processing, 2012
    Co-Authors: Panagiotis E. Hadjidoukas, Evaggelos Lappas, Vassilios V Dimakopoulos
    Abstract:

    With the increasing diversity of computing systems and the rapid performance improvement of commodity hardware, heterogeneous clusters become the dominant platform for low-cost, high-performance computing. Grid-enabled and heterogeneous implementations of MPI establish it as the de facto programming model for these environments. On the other hand, task parallelism provides a natural way for exploiting their hierarchical architecture. This hierarchy has been further extended with the advent of general-purpose GPU devices. In this paper we present the implementation of an MPI-based task Library for heterogeneous and GPU clusters. The Library offers an intuitive programming interface for multilevel task parallelism with transparent data management and load balancing. We discuss design and implementation issues regarding heterogeneity support and report performance results on heterogeneous cluster computing environments.

  • PDP - A Runtime Library for Platform-Independent Task Parallelism
    2012 20th Euromicro International Conference on Parallel Distributed and Network-based Processing, 2012
    Co-Authors: Panagiotis E. Hadjidoukas, Evaggelos Lappas, Vassilios V Dimakopoulos
    Abstract:

    With the increasing diversity of computing systems and the rapid performance improvement of commodity hardware, heterogeneous clusters become the dominant platform for low-cost, high-performance computing. Grid-enabled and heterogeneous implementations of MPI establish it as the de facto programming model for these environments. On the other hand, task parallelism provides a natural way for exploiting their hierarchical architecture. This hierarchy has been further extended with the advent of general-purpose GPU devices. In this paper we present the implementation of an MPI-based task Library for heterogeneous and GPU clusters. The Library offers an intuitive programming interface for multilevel task parallelism with transparent data management and load balancing. We discuss design and implementation issues regarding heterogeneity support and report performance results on heterogeneous cluster computing environments.

  • HPCS - Task-parallel global optimization with application to protein folding
    2011 International Conference on High Performance Computing & Simulation, 2011
    Co-Authors: C. Voglis, Panagiotis E. Hadjidoukas, Vassilios V Dimakopoulos, Isaac E. Lagaris, D.g. Papageorgiou
    Abstract:

    This paper presents a software framework for high performance numerical global optimization. At the core, a Runtime Library implements a programming environment for irregular and adaptive task-based parallelism. Building on this, we extract and exploit the multilevel parallelism of a global optimization application that is based on numerical differentiation and Newton-based local optimizations. Our framework is used in the efficient parallelization of a real application case that concerns the protein folding problem. The experimental evaluation presents performance results of our software system on a multicore cluster.

  • portable Runtime support and exploitation of nested parallelism in openmp
    2004
    Co-Authors: Panagiotis E. Hadjidoukas, Laurent Amsaleg
    Abstract:

    In this paper, we present an alternative implementation of the NANOS OpenMP Runtime Library (NthLib) that targets portability and efficient support of multiple levels of parallelism. We have implemented the Runtime libraries of available open-source OpenMP compilers on top of NthLib, reducing thus their overheads and providing them with inherent support for nested parallelism. In addition, we present an experimental implementation of the workqueuing model and the parallelization of a data clustering algorithm using OpenMP directives. The asymmetry and non-determinism of this algorithm necessitate the exploitation of its nested loop-level parallelism. The experimental results on a SMP server with four processors demonstrate our efficient OpenMP Runtime support.

  • ISHPC - OpenMP for Adaptive Master-Slave Message Passing Applications
    Lecture Notes in Computer Science, 2003
    Co-Authors: Panagiotis E. Hadjidoukas, Eleftherios D. Polychronopoulos, Theodore S. Papatheodorou
    Abstract:

    This paper presents a prototype Runtime environment for programming and executing adaptive master-slave message passing applications on cluster of multiprocessors. A sophisticated portable Runtime Library provides transparent load balancing and exports a convenient application programming interface (API) for multilevel fork-join RPC-like parallelism on top of the Message Passing Interface. This API can be used directly or through OpenMP directives. A source-to-source translator converts programs that use an extended version of the OpenMP workqueuing execution model into equivalent programs with calls to the Runtime Library. Experimental results show that our Runtime environment combines the simplicity of OpenMP with the performance of message passing.

Jesus Labarta - One of the best experts on this subject based on the ideXlab platform.

  • SC - CellSs: a programming model for the cell BE architecture
    Proceedings of the 2006 ACM IEEE conference on Supercomputing - SC '06, 2006
    Co-Authors: Pieter Bellens, Josep M. Perez, Rosa M. Badia, Jesus Labarta
    Abstract:

    In this work we present Cell superscalar (CellSs) which addresses the automatic exploitation of the functional parallelism of a sequential program through the different processing elements of the Cell BE architecture. The focus in on the simplicity and flexibility of the programming model. Based on a simple annotation of the source code, a source to source compiler generates the necessary code and a Runtime Library exploits the existing parallelism by building at Runtime a task dependency graph. The Runtime takes care of the task scheduling and data handling between the different processors of this heterogeneous architecture. Besides, a locality-aware task scheduling has been implemented to reduce the overhead of data transfers. The approach has been implemented and tested with a set of examples and the results obtained since now are promising.

  • WOMPAT - Runtime adjustment of parallel nested loops
    Lecture Notes in Computer Science, 2005
    Co-Authors: Alejandro Duran, Julita Corbalan, Raul E. Silvera, Jesus Labarta
    Abstract:

    OpenMP allows programmers to specify nested parallelism in parallel applications. In the case of scientific applications, parallel loops are the most important source of parallelism. In this paper we present an automatic mechanism to dynamically detect the best way to exploit the parallelism when having nested parallel loops. This mechanism is based on the number of threads, the problem size, and the number of iterations on the loop. To do that, we claim that programmers must specify the potential application parallelism and give the Runtime the responsibility to decide the best way to exploit it. We have implemented this mechanism inside the IBM XL Runtime Library. Evaluation shows that our mechanism dynamically adapts the parallelism generated to the application and Runtime parameters, reaching the same speedup as the best static parallelization (with a priori information).

  • LCPC - OpenMP Extensions for Thread Groups and Their Run-Time Support
    Languages and Compilers for Parallel Computing, 2001
    Co-Authors: Marc Gonzalez, José Luis Hervás Oliver, Xavier Martorell, Eduard Ayguadé, Jesus Labarta, Nacho Navarro
    Abstract:

    This paper presents a set of proposals for the OpenMP shared-memory programming model oriented towards the definition of thread groups in the framework of nested parallelism. The paper also describes the additional functionalities required in the Runtime Library supporting the parallel execution. The extensions have been implemented in the OpenMP NanosCompiler and evaluated in a set of real applications and benchmarks. In this paper we present experimental results for one of these applications.

Joel H. Saltz - One of the best experts on this subject based on the ideXlab platform.

  • Parallel Monte Carlo simulation of three-dimensional flow over a flat plate
    Journal of Thermophysics and Heat Transfer, 1995
    Co-Authors: Robert P. Nance, Richard G. Wilmoth, Bongki Moon, Hassan Hassan, Joel H. Saltz
    Abstract:

    This article describes a parallel implementation of the direct simulation Monte Carlo (DSMC) method. Runtime Library support is used for scheduling and execution of communication between nodes, and domain decomposition is performed dynamically to maintain a favorable load balance. Performance tests are conducted using the code to evaluate various remapping and remapping-interval policies, and it is shown that a onedimensional chain-partitioning method works best for the problems considered. The parallel code is then used to simulate the Mach 20 nitrogen flow over a finite thickness flat plate. It will be shown that the parallel algorithm produces results that are very similar to previous DSMC results, despite the increased resolution available. However, it yields significantly faster execution times than the scalar code, as well as very good load-balance and scalability characteristics.

  • IPPS - Interoperability of data parallel Runtime libraries
    Proceedings 11th International Parallel Processing Symposium, 1
    Co-Authors: Guy Edjlali, Alan Sussman, Joel H. Saltz
    Abstract:

    This paper describes a framework for providing the ability to use multiple specialized data parallel libraries and/or languages within a single application. The ability to use multiple libraries is required in many application areas, such as multidisciplinary complex physical simulations and remote sensing image database applications. An application can consist of one program or multiple programs that use different libraries to parallelize operations on distributed data structures. The framework is embodied in a Runtime Library called Meta-Chaos that has been used to exchange data between data parallel programs written using High Performance Fortran, the Chaos and Multiblock Parti libraries developed at Maryland for handling various types of unstructured problems, and the Runtime Library for pC++, a data parallel version of C++ from Indiana University. Experimental results show that Meta-Chaos is able to move data between libraries efficiently and that Meta-Chaos provides effective support for complex applications.

  • IPPS - Data parallel programming in an adaptive environment
    Proceedings of 9th International Parallel Processing Symposium, 1
    Co-Authors: Guy Edjlali, Alan Sussman, Gagan Agrawal, Joel H. Saltz
    Abstract:

    For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at Runtime. In this paper, we discuss Runtime support for data parallel programming in such an adaptive environment. Executing data parallel-programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a Runtime Library to provide this support. We also present performance results for a multiblock Navier-Stokes solver run on a network of workstations using PVM for message passing. Our experiments show that if the number of processors is not varied frequently, the cost of data redistribution is not significant compared to the time required for the actual computations. >

Alok Choudhary - One of the best experts on this subject based on the ideXlab platform.

  • Multicollective I/O: A technique for exploiting inter-file access patterns
    ACM Transactions on Storage, 2006
    Co-Authors: Gokhan Memik, Mahmut Kandemir, Wei-keng Liao, Alok Choudhary
    Abstract:

    The increasing gap between processor cycle times and access times to storage devices makes it necessary to use powerful optimizations. This is especially true for applications in the parallel computing domain that frequently perform large amounts of file I/O. Collective I/O strategy that coordinates the processes to perform I/O on each other's behalf has demonstrated a significant performance improvement. This article proposes a new concept called Multicollective I/O (MCIO) that expands the collective I/O to allow data from multiple files to be requested in a single I/O request, in contrast to allowing only multiple segments for a single file to be specified together. MCIO considers multiple arrays simultaneously by having a more global view of the overall I/O behavior exhibited by parallel applications. This article shows that determining the optimal MCIO access pattern is an NP-complete problem, and proposes two different heuristics for the access pattern detection problem, also called the assignment problem. Both heuristics have been implemented within a Runtime Library, and tested using a large-scale scientific application. Our results show that MCIO outperforms collective I/O by as much as 87%. Our Runtime Library-based implementation can be used by application users as well as by optimizing compilers. Based on our results, we recommend that future Library designers for I/O-intensive applications include MCIO in their suite of optimizations.

  • FAST - Exploiting inter-file access patterns using multi-collective I/O
    2002
    Co-Authors: Gokhan Memik, Mahmut Kandemir, Alok Choudhary
    Abstract:

    This paper introduces a new concept called Multi-Collective I/O (MCIO) that extends conventional collective I/O to optimize I/O accesses to multiple arrays simultaneously. In this approach, as in collective I/O, multiple processors co-ordinate to perform I/O on behalf of each other if doing so improves overall I/O time. However, unlike collective I/O, MCIO considers multiple arrays simultaneously; that is, it has a more global view of the overall I/O behavior exhibited by application. This paper shows that determining optimal MCIO access pattern is an NP-complete problem, and proposes two different heuristics for the access pattern detection problem (also called the assignment problem). Both of the heuristics have been implemented within a Runtime Library, and tested using a large-scale scientific application. Our preliminary results show that MCIO outperforms collective I/O by as much as 87%. Our Runtime Library-based implementation can be used by users as well as optimizing compilers. Based on our results, we recommend future Library designers for I/O-intensive applications to include MCIO in their suite of optimizations.

  • IEEE Symposium on Mass Storage Systems - APRIL: A Run-Time Library for Tape-Resident Data.
    2000
    Co-Authors: Gokhan Memik, Alok Choudhary, Mahmut Kandemir, Valerie Taylor
    Abstract:

    Over the last decade, processors have made enormous gains in speed. But increase in the speed of the secondary and tertiary storage devices could not cope with these gains. The result is that the secondary and tertiary storage access times dominate execution time of data intensive computations. Therefore, in scientific computations, efficient data access functionality for data stored in secondary and tertiary storage is a must. In this paper, we give an overview of APRIL, a parallel Runtime Library that can be used in applications that process tape-resident data. We present user interface and underlying optimization strategy. We also discuss performance improvements provided by the Library on the High Performance Storage System (HPSS). The preliminary results reveal that the optimizations can improve response times by up to 97.2%.

  • PARCO - Run-Time Library for Parallel I/O for Irregular Applications*
    Advances in Parallel Computing, 1998
    Co-Authors: Alok Choudhary
    Abstract:

    We present a Runtime Library design based on the two-phase collective I/O technique for irregular applications and show the performance results on the Intel Paragon. We obtained up to 40MBytes/sec. application level performance on the Caltech's Intel Paragon(with 16 I/O nodes, each containing one disk) which includes on-the-fly reordering costs. We observed up to 60MBytes/sec. on the ASCI/Red Teraflops with only three I/O nodes (with RAIDs).

  • PASSION Runtime Library for the Intel Paragon
    1995
    Co-Authors: Alok Choudhary, Rajesh Bordawekar, Sachin More
    Abstract:

    We are developing a Runtime Library which provides a number of routines to perform the I/O required in parallel applications in an efficient and convenient manner. This is part of a project called PASSION, which aims to provide software support for high-performance parallel I/O at the compiler, Runtime and file system levels. The PASSION Runtime Library uses a high-level interface which makes it easy for the user to specify the I/O required in the program. The user only needs to specify what portion of the data structure needs to read from or written to the file, and the PASSION routines will perform all the necessary I/O efficiently. This paper gives an overview of the PASSION Runtime Library and describes in detail its high-level interface.

Antonio Lain - One of the best experts on this subject based on the ideXlab platform.

  • evaluation of compiler and Runtime Library approaches for supporting parallel regular applications
    Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing, 1998
    Co-Authors: D R Chakrabarti, P Banerjee, Antonio Lain
    Abstract:

    Important applications including those in computational chemistry, computational fluid dynamics, structural analysis and sparse matrix applications usually consist of a mixture of regular and irregular accesses. While current state-of-the-art run-time Library support for such applications handles the irregular accesses reasonably well, the efficacy of the optimizations at run-time for the regular accesses is yet to be proven. This paper aims to find a better approach to handle the above applications in a unified compiler and run-time framework. Specifically, this paper considers only regular applications and evaluates the performance of two approaches, a run-rime approach using PILAR and a compile-time approach using a commercial HPF compiler. This study shows that using a particular representation of regular accesses, the performance of regular code using run-time libraries can come close to the performance of code generated by a compiler. It also determines the operations that usually contribute largely to the run-time overhead in case of regular accesses. Experimental results are reported for three regular applications on a 16-processor IBM SP-2.

  • IPPS/SPDP - Evaluation of compiler and Runtime Library approaches for supporting parallel regular applications
    Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing, 1
    Co-Authors: D R Chakrabarti, P Banerjee, Antonio Lain
    Abstract:

    Important applications including those in computational chemistry, computational fluid dynamics, structural analysis and sparse matrix applications usually consist of a mixture of regular and irregular accesses. While current state-of-the-art run-time Library support for such applications handles the irregular accesses reasonably well, the efficacy of the optimizations at run-time for the regular accesses is yet to be proven. This paper aims to find a better approach to handle the above applications in a unified compiler and run-time framework. Specifically, this paper considers only regular applications and evaluates the performance of two approaches, a run-rime approach using PILAR and a compile-time approach using a commercial HPF compiler. This study shows that using a particular representation of regular accesses, the performance of regular code using run-time libraries can come close to the performance of code generated by a compiler. It also determines the operations that usually contribute largely to the run-time overhead in case of regular accesses. Experimental results are reported for three regular applications on a 16-processor IBM SP-2.