Global Address Space

The Experts below are selected from a list of 1959 Experts worldwide ranked by ideXlab platform

P. Sadayappan - One of the best experts on this subject based on the ideXlab platform.

work stealing for gpu accelerated parallel programs in a Global Address Space framework

Concurrency and Computation: Practice and Experience, 2016

Co-Authors: Humayun Arafat, Sriram Krishnamoorthy, James Dinan, Pavan Balaji, P. Sadayappan

Abstract:

Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit GPU systems in a Global-Address Space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a function of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSDT application module from the computational chemistry domain. Copyright © 2016 John Wiley & Sons, Ltd.

15 days free trial to Access Article
HiPC - A Global Address Space approach to automated data management for parallel Quantum Monte Carlo applications

2012 19th International Conference on High Performance Computing, 2012

Co-Authors: James Dinan, Sravya Tirukkovalur, Lubos Mitas, Lucas K. Wagner, P. Sadayappan

Abstract:

Quantum Monte Carlo (QMC) applications perform simulation with respect to an initial state of the quantum mechanical system, which is often captured by using a cubic B-spline basis. This representation is stored as a read-only table of coefficients, and accesses to the table are generated at random as part of the Monte Carlo simulation. Current QMC applications such as QWalk and QMCPACK, replicate this table at every process or node, which limits scalability because increasing the number of processors does not enable larger systems to be run. We present a partitioned Global Address Space (PGAS) approach to transparently managing this data using Global Arrays in a manner that allows the memory of multiple nodes to be aggregated. We develop an automated data management system that significantly reduces communication overheads, enabling new capabilities for QMC codes. Experimental results with the QWalk application demonstrate the effectiveness of the data management system.

15 days free trial to Access Article
A Global Address Space approach to automated data management for parallel Quantum Monte Carlo applications

2012 19th International Conference on High Performance Computing, 2012

Co-Authors: James Dinan, Sravya Tirukkovalur, Lubos Mitas, Lucas Wagner, P. Sadayappan

Abstract:

Quantum Monte Carlo (QMC) applications perform simulation with respect to an initial state of the quantum mechanical system, which is often captured by using a cubic B-spline basis. This representation is stored as a read-only table of coefficients, and accesses to the table are generated at random as part of the Monte Carlo simulation. Current QMC applications such as QWalk and QMCPACK, replicate this table at every process or node, which limits scalability because increasing the number of processors does not enable larger systems to be run. We present a partitioned Global Address Space (PGAS) approach to transparently managing this data using Global Arrays in a manner that allows the memory of multiple nodes to be aggregated. We develop an automated data management system that significantly reduces communication overheads, enabling new capabilities for QMC codes. Experimental results with the QWalk application demonstrate the effectiveness of the data management system.

15 days free trial to Access Article
CLUSTER - Non-collective parallel I/O for Global Address Space programming models

2007 IEEE International Conference on Cluster Computing, 2007

Co-Authors: S. Krishnamoorthy, J. Nieplocha, Vinod Tipparaju, Juan Piernas Canovas, P. Sadayappan

Abstract:

Achieving high performance for out-of-core applications typically involves explicit management of the movement of data between the disk and the physical memory. We are developing a programming environment in which the different levels of the memory hierarchy are handled efficiently in a unified transparent framework. In this paper, we present our experiences with implementing efficient non-collective I/O (GPCIO) as part of this framework. As a generalization of the remote procedure call (RPC) that was used as a foundation for the Sun NFS system, we developed a Global procedure call (GPC) to invoke procedures on a remote node to handle non-collective I/O. We consider alternative approaches that can be employed in implementing this functionality. The approaches are evaluated using a representative computation from quantum chemistry. The results demonstrate that GPC-IO achieves better absolute execution times, strong-scaling, and weak-scaling than the alternatives considered.

15 days free trial to Access Article
A Global Address Space framework for locality aware scheduling of block-sparse computations

2007 IEEE International Parallel and Distributed Processing Symposium, 2007

Co-Authors: Sriram Krishnamoorthy, Jarek Nieplocha, Atanas Rountev, Umit Catalyurek, P. Sadayappan

Abstract:

In this paper, we present a mechanism for automatic management of the memory hierarchy, including secondary storage, in the context of a Global Address Space parallel programming framework. The programmer specifies the parallelism and locality in the computation. The scheduling of the computation into stages, together with the movement of the associated data between secondary storage and Global memory, and between Global memory and local memory, is automatically managed. A novel formulation of hypergraph partitioning is used to model the optimization problem of minimizing disk I/O. Experimental evaluation using a sub-computation from the quantum chemistry domain shows a reduction in the disk I/O cost by up to a factor of 11, and a reduction in turnaround time by up to 49%, as compared to alternative approaches used in state-of-the-art quantum chemistry codes.

15 days free trial to Access Article

Katherine Yelick - One of the best experts on this subject based on the ideXlab platform.

ARRAY@PLDI - A Local-View Array Library for Partitioned Global Address Space C++ Programs

Proceedings of ACM SIGPLAN International Workshop on Libraries Languages and Compilers for Array Programming - ARRAY'14, 2014

Co-Authors: Amir Kamil, Yili Zheng, Katherine Yelick

Abstract:

Multidimensional arrays are an important data structure in many scientific applications. Unfortunately, built-in support for such arrays is inadequate in C++, particularly in the distributed setting where bulk communication operations are required for good performance. In this paper, we present a multidimensional library for partitioned Global Address Space (PGAS) programs, supporting the one-sided remote access and bulk operations of the PGAS model. The library is based on Titanium arrays, which have proven to provide good productivity and performance. These arrays provide a local view of data, where each rank constructs its own portion of a Global data structure, matching the local view of execution common to PGAS programs and providing maximum flexibility in structuring Global data. Unlike Titanium, which has its own compiler with array-specific analyses, optimizations, and code generation, we implement multidimensional arrays solely through a C++ library. The main goal of this effort is to provide a library-based implementation that can match the productivity and performance of a compiler-based approach. We implement the array library as an extension to UPC++, a C++ library for PGAS programs, and we extend Titanium arrays with specializations to improve performance. We evaluate the array library by porting four Titanium benchmarks to UPC++, demonstrating that it can achieve up to 25% better performance than Titanium without a significant increase in programmer effort.

15 days free trial to Access Article
a local view array library for partitioned Global Address Space c programs

Programming Language Design and Implementation, 2014

Co-Authors: Amir Kamil, Yili Zheng, Katherine Yelick

Abstract:

Multidimensional arrays are an important data structure in many scientific applications. Unfortunately, built-in support for such arrays is inadequate in C++, particularly in the distributed setting where bulk communication operations are required for good performance. In this paper, we present a multidimensional library for partitioned Global Address Space (PGAS) programs, supporting the one-sided remote access and bulk operations of the PGAS model. The library is based on Titanium arrays, which have proven to provide good productivity and performance. These arrays provide a local view of data, where each rank constructs its own portion of a Global data structure, matching the local view of execution common to PGAS programs and providing maximum flexibility in structuring Global data. Unlike Titanium, which has its own compiler with array-specific analyses, optimizations, and code generation, we implement multidimensional arrays solely through a C++ library. The main goal of this effort is to provide a library-based implementation that can match the productivity and performance of a compiler-based approach. We implement the array library as an extension to UPC++, a C++ library for PGAS programs, and we extend Titanium arrays with specializations to improve performance. We evaluate the array library by porting four Titanium benchmarks to UPC++, demonstrating that it can achieve up to 25% better performance than Titanium without a significant increase in programmer effort.

15 days free trial to Access Article
tuning collective communication for partitioned Global Address Space programming models

Parallel Computing, 2011

Co-Authors: Rajesh Nishtala, Paul Hargrove, Yili Zheng, Katherine Yelick

Abstract:

Partitioned Global Address Space (PGAS) languages offer programmers the convenience of a shared memory programming style combined with locality control necessary to run on large-scale distributed memory systems. Even within a PGAS language programmers often need to perform Global communication operations such as broadcasts or reductions, which are best performed as collective operations in which a group of threads work together to perform the operation. In this paper we consider the problem of implementing collective communication within PGAS languages and explore some of the design trade-offs in both the interface and implementation. In particular, PGAS collectives have semantic issues that are different than in send-receive style message passing programs, and different implementation approaches that take advantage of the one-sided communication style in these languages. We present an implementation framework for PGAS collectives as part of the GASNet communication layer, which supports shared memory, distributed memory and hybrids. The framework supports a broad set of algorithms for each collective, over which the implementation may be automatically tuned. Finally, we demonstrate the benefit of optimized GASNet collectives using application benchmarks written in UPC, and demonstrate that the GASNet collectives can deliver scalable performance on a variety of state-of-the-art parallel machines including a Cray XT4, an IBM BlueGene/P, and a Sun Constellation system with InfiniBand interconnect.

15 days free trial to Access Article
porting gasnet to portals partitioned Global Address Space pgas language support for the cray xt

2009

Co-Authors: Dan Bonachea, Paul Hargrove, Michael Welcome, Katherine Yelick

Abstract:

Partitioned Global Address Space (PGAS) Languages are an emerging alternative to MPI for HPC applications development. The GASNet library from Lawrence Berkeley National Lab and the University of California at Berkeley provides the network runtime for multiple implementations of four PGAS Languages: Unified Parallel C (UPC), Co-Array Fortran (CAF), Titanium and Chapel. GASNet provides a low overhead one-sided communication layer has enabled portability and high performance of PGAS languages. This paper describes our experiences porting GASNet to the Portals network API on the Cray XT series.

15 days free trial to Access Article
productivity and performance using partitioned Global Address Space languages

Parallel Symbolic Computation, 2007

Co-Authors: Katherine Yelick, Dan Bonachea, Weiyu Chen, Phillip Colella, Kaushik Datta, Jason Duell, Susan L Graham, Paul Hargrove, Paul N Hilfinger, Parry Husbands

Abstract:

Partitioned Global Address Space (PGAS) languages combine the programming convenience of shared memory with the locality and performance control of message passing. One such language, Unified Parallel C (UPC) is an extension of ISO C defined by a consortium that boasts multiple proprietary and open source compilers. Another PGAS language, Titanium, is a dialect of JavaTM designed for high performance scientific computation. In this paper we describe some of the highlights of two related projects, the Titanium project centered at U.C. Berkeley and the UPC project centered at Lawrence Berkeley National Laboratory. Both compilers use a source-to-source strategy that trans-lates the parallel languages to C with calls to a communication layer called GASNet. The result is portable high-performance compilers that run on a large variety of shared and distributed memory multiprocessors. Both projects combine compiler, runtime, and application efforts to demonstrate some of the performance and productivity advantages to these languages.

15 days free trial to Access Article

Alan D George - One of the best experts on this subject based on the ideXlab platform.

parallel performance wizard a performance system for the analysis of partitioned Global Address Space applications

IEEE International Conference on High Performance Computing Data and Analytics, 2010

Co-Authors: Hunghsun Su, Max Billingsley, Alan D George

Abstract:

Given the complexity of high-performance parallel programs, developers often must rely on performance analysis tools to help them improve the performance of their applications. While many tools support analysis of message-passing programs, tool support is limited for applications written in programming models that present a partitioned Global Address Space (PGAS) to the programmer such as UPC and SHMEM. Existing tools that support message-passing models are difficult to extend to support PGAS models due to differences between the two paradigms and the techniques used in their implementations. In this paper, we present our work on Parallel Performance Wizard (PPW), a performance analysis system for PGAS and MPI application analysis. We discuss new concepts, namely the generic-operation-type abstraction and GASP-enabled data collection, developed to facilitate support for multiple programming models and then give an overview of PPWâs automatic analysis and visualization capabilities. Finally, to show the usefulness of our system, we present results on PPWâs overhead, storage requirements and scalability before demonstrating its effectiveness via application case studies.

15 days free trial to Access Article
Parallel performance wizard: A performance analysis tool for partitioned Global-Address-Space programming

2008 IEEE International Symposium on Parallel and Distributed Processing, 2008

Co-Authors: Hunghsun Su, Max Billingsley, Alan D George

Abstract:

Given the complexity of parallel programs, developers often must rely on performance analysis tools to help them improve the performance of their code. While many tools support the analysis of message-passing programs, no tool exists that fully supports programs written in programming models that present a partitioned Global Address Space (PGAS) to the programmer, such as UPC and SHMEM. Existing tools with support for message-passing models cannot be easily extended to support PGAS programming models, due to the differences between these paradigms. Furthermore, the inclusion of implicit and one-sided communication in PGAS models renders many of the analyses performed by existing tools irrelevant. For these reasons, there exists a need for a new performance tool capable of handling the challenges associated with PGAS models. In this paper, we first present background research and the framework for Parallel Performance Wizard (PPW), a modularized, event-based performance analysis tool for PGAS programming models. We then discuss features of PPW and how they are used in the analysis of PGAS applications. Finally, we illustrate how one would use PPW in the analysis and optimization of PGAS applications by presenting a small case study using the PPW version 1.0 implementation.

15 days free trial to Access Article
IPDPS - Parallel performance wizard: A performance analysis tool for partitioned Global-Address-Space programming

2008 IEEE International Symposium on Parallel and Distributed Processing, 2008

Co-Authors: Hunghsun Su, Max Billingsley, Alan D George

Abstract:

Given the complexity of parallel programs, developers often must rely on performance analysis tools to help them improve the performance of their code. While many tools support the analysis of message-passing programs, no tool exists that fully supports programs written in programming models that present a partitioned Global Address Space (PGAS) to the programmer, such as UPC and SHMEM. Existing tools with support for message-passing models cannot be easily extended to support PGAS programming models, due to the differences between these paradigms. Furthermore, the inclusion of implicit and one-sided communication in PGAS models renders many of the analyses performed by existing tools irrelevant. For these reasons, there exists a need for a new performance tool capable of handling the challenges associated with PGAS models. In this paper, we first present background research and the framework for Parallel Performance Wizard (PPW), a modularized, event-based performance analysis tool for PGAS programming models. We then discuss features of PPW and how they are used in the analysis of PGAS applications. Finally, we illustrate how one would use PPW in the analysis and optimization of PGAS applications by presenting a small case study using the PPW version 1.0 implementation.

15 days free trial to Access Article
parallel performance wizard a performance analysis tool for partitioned Global Address Space programming models

Conference on High Performance Computing (Supercomputing), 2006

Co-Authors: Adam Leko, Dan Bonachea, Hunghsun Su, Bryan Golden, Max Billingsley, Alan D George

Abstract:

Scientific programmers must optimize the total time-to-solution, the combination of software development and refinement time and actual execution time. The increasing complexity at all levels of supercomputing architectures, coupled with advancements in sequential performance and a growing degree of hardware parallelism, has increasingly placed the bulk of the time-to-solution cost into the software development and tuning phase. Performance analysis tools have been useful for reducing the time-to-solution for message-passing applications; however, there is insufficient tool support for programs developed using Global-Address-Space (GAS) programming models. With the aim of maximizing user productivity, the Parallel Performance Wizard (PPW) fills this void by providing a full range of visualizations and analyses specifically designed for GAS models. To facilitate accurate instrumentation and measurement of GAS programs in PPW, a portable, model-independent performance tool interface (GASP) has been developed and successfully used with Berkeley UPC.

15 days free trial to Access Article
GASP: A Performance Analysis Tool Interface for Global AddressSpace Programming Models, Version 1.5

Lawrence Berkeley National Laboratory, 2006

Co-Authors: Adam Leko, Dan Bonachea, Hunghsun Su, Alan D George, Hans Sherburne

Abstract:

Due to the wide range of compilers and the lack of astandardized performance tool interface, writers of performance toolsface many challenges when incorporating support for Global Address Space(GAS) programming models such as Unified Parallel C (UPC), Titanium, andCo-Array Fortran (CAF). This document presents a Global Address SpacePerformance tool interface (GASP) that is flexible enough to be adaptedinto current Global Address Space compiler and runtime infrastructureswith little effort, while allowing performance analysis tools to gathermuch information about the performance of Global Address Spaceprograms.

15 days free trial to Access Article

Dan Bonachea - One of the best experts on this subject based on the ideXlab platform.

porting gasnet to portals partitioned Global Address Space pgas language support for the cray xt

2009

Co-Authors: Dan Bonachea, Paul Hargrove, Michael Welcome, Katherine Yelick

Abstract:

Partitioned Global Address Space (PGAS) Languages are an emerging alternative to MPI for HPC applications development. The GASNet library from Lawrence Berkeley National Lab and the University of California at Berkeley provides the network runtime for multiple implementations of four PGAS Languages: Unified Parallel C (UPC), Co-Array Fortran (CAF), Titanium and Chapel. GASNet provides a low overhead one-sided communication layer has enabled portability and high performance of PGAS languages. This paper describes our experiences porting GASNet to the Portals network API on the Cray XT series.

15 days free trial to Access Article
PASCO - Productivity and performance using partitioned Global Address Space languages

Proceedings of the 2007 international workshop on Parallel symbolic computation - PASCO '07, 2007

Co-Authors: Katherine Yelick, Dan Bonachea, Weiyu Chen, Phillip Colella, Kaushik Datta, Jason Duell, Susan L Graham, Paul Hargrove, Paul N Hilfinger, Parry Husbands

Abstract:

Partitioned Global Address Space (PGAS) languages combine the programming convenience of shared memory with the locality and performance control of message passing. One such language, Unified Parallel C (UPC) is an extension of ISO C defined by a consortium that boasts multiple proprietary and open source compilers. Another PGAS language, Titanium, is a dialect of JavaTM designed for high performance scientific computation. In this paper we describe some of the highlights of two related projects, the Titanium project centered at U.C. Berkeley and the UPC project centered at Lawrence Berkeley National Laboratory. Both compilers use a source-to-source strategy that trans-lates the parallel languages to C with calls to a communication layer called GASNet. The result is portable high-performance compilers that run on a large variety of shared and distributed memory multiprocessors. Both projects combine compiler, runtime, and application efforts to demonstrate some of the performance and productivity advantages to these languages.

15 days free trial to Access Article
productivity and performance using partitioned Global Address Space languages

Parallel Symbolic Computation, 2007

Co-Authors: Katherine Yelick, Dan Bonachea, Weiyu Chen, Phillip Colella, Kaushik Datta, Jason Duell, Susan L Graham, Paul Hargrove, Paul N Hilfinger, Parry Husbands

Abstract:

Partitioned Global Address Space (PGAS) languages combine the programming convenience of shared memory with the locality and performance control of message passing. One such language, Unified Parallel C (UPC) is an extension of ISO C defined by a consortium that boasts multiple proprietary and open source compilers. Another PGAS language, Titanium, is a dialect of JavaTM designed for high performance scientific computation. In this paper we describe some of the highlights of two related projects, the Titanium project centered at U.C. Berkeley and the UPC project centered at Lawrence Berkeley National Laboratory. Both compilers use a source-to-source strategy that trans-lates the parallel languages to C with calls to a communication layer called GASNet. The result is portable high-performance compilers that run on a large variety of shared and distributed memory multiprocessors. Both projects combine compiler, runtime, and application efforts to demonstrate some of the performance and productivity advantages to these languages.

15 days free trial to Access Article
automatic nonblocking communication for partitioned Global Address Space programs

International Conference on Supercomputing, 2007

Co-Authors: Weiyu Chen, Dan Bonachea, Costin Iancu, Katherine Yelick

Abstract:

Overlapping communication with computation is an important optimization on current cluster architectures; its importance is likely to increase as the doubling of processing power far outpaces any improvements in communication latency. PGAS languages offer unique opportunities for communication overlap, because their one-sided communication model enables low overhead data transfer. Recent results have shown the value of hiding latency by manually applying language-level nonblocking data transfer routines, but this process can be both tedious and error-prone. In this paper, we present a runtime framework that automatically schedules the data transfers to achieve overlap. The optimization framework is entirely transparent to the user, and aggressively reorders and aggregates both remote puts and gets. We preserve correctness via runtime conflict checks and temporary buffers, using several techniques to lower the overhead. Experimental results on application benchmarks suggest that our framework can be very effective at hiding communication latency on clusters, improving performance over the blocking code by an average of 16% for some of the NAS Parallel Benchmarks, 48% for GUPS, and over 25% for a multi-block fluid dynamics solver. While the system is not yet as effective as aggressive manual optimization, it increases programmers' productivity by freeing them from the details of communication management.

15 days free trial to Access Article
ICS - Automatic nonblocking communication for partitioned Global Address Space programs

Proceedings of the 21st annual international conference on Supercomputing - ICS '07, 2007

Co-Authors: Weiyu Chen, Dan Bonachea, Costin Iancu, Katherine Yelick

Abstract:

Overlapping communication with computation is an important optimization on current cluster architectures; its importance is likely to increase as the doubling of processing power far outpaces any improvements in communication latency. PGAS languages offer unique opportunities for communication overlap, because their one-sided communication model enables low overhead data transfer. Recent results have shown the value of hiding latency by manually applying language-level nonblocking data transfer routines, but this process can be both tedious and error-prone. In this paper, we present a runtime framework that automatically schedules the data transfers to achieve overlap. The optimization framework is entirely transparent to the user, and aggressively reorders and aggregates both remote puts and gets. We preserve correctness via runtime conflict checks and temporary buffers, using several techniques to lower the overhead. Experimental results on application benchmarks suggest that our framework can be very effective at hiding communication latency on clusters, improving performance over the blocking code by an average of 16% for some of the NAS Parallel Benchmarks, 48% for GUPS, and over 25% for a multi-block fluid dynamics solver. While the system is not yet as effective as aggressive manual optimization, it increases programmers' productivity by freeing them from the details of communication management.

15 days free trial to Access Article

Sriram Krishnamoorthy - One of the best experts on this subject based on the ideXlab platform.

work stealing for gpu accelerated parallel programs in a Global Address Space framework

Concurrency and Computation: Practice and Experience, 2016

Co-Authors: Humayun Arafat, Sriram Krishnamoorthy, James Dinan, Pavan Balaji, P. Sadayappan

Abstract:

Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit GPU systems in a Global-Address Space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a function of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSDT application module from the computational chemistry domain. Copyright © 2016 John Wiley & Sons, Ltd.

15 days free trial to Access Article
performance characterization of Global Address Space applications a case study with nwchem

Concurrency and Computation: Practice and Experience, 2012

Co-Authors: Jeff R Hammond, Sriram Krishnamoorthy, Sameer Shende, Nichols A Romero, Allen D Malony

Abstract:

The use of Global Address Space languages and one-sided communication for complex applications is gaining attention in the parallel computing community. However, lack of good evaluative methods to observe multiple levels of performance makes it difficult to isolate the cause of performance deficiencies and to understand the fundamental limitations of system and application design for future improvement. NWChem is a popular computational chemistry package, which depends on the Global Arrays/Aggregate Remote Memory Copy Interface suite for partitioned Global Address Space functionality to deliver high-end molecular modeling capabilities. A workload characterization methodology was developed to support NWChem performance engineering on large-scale parallel platforms. The research involved both the integration of performance instrumentation and measurement in the NWChem software, as well as the analysis of one-sided communication performance in the context of NWChem workloads. Scaling studies were conducted for NWChem on Blue Gene/P and on two large-scale clusters using different generation Infiniband interconnects and x86 processors. The performance analysis and results show how subtle changes in the runtime parameters related to the communication subsystem could have significant impact on performance behavior. The tool has successfully identified several algorithmic bottlenecks, which are already being tackled by computational chemists to improve NWChem performance. Copyright © 2011 John Wiley & Sons, Ltd.

15 days free trial to Access Article
scalable transparent checkpoint restart of Global Address Space applications on virtual machines over infiniband

Computing Frontiers, 2009

Co-Authors: Oreste Villa, Jarek Nieplocha, Sriram Krishnamoorthy, David M Brown

Abstract:

Checkpoint-Restart is one of the most used software approaches to achieve fault-tolerance in high-end clusters. While standard techniques typically focus on user-level solutions, the advent of virtualization software has enabled efficient and transparent system-level approaches. In this paper, we present a scalable transparent system-level solution to Address fault-tolerance for applications based on Global Address Space (GAS) programming models on Infiniband clusters. In addition to handling communication, the solution Addresses transparent checkpoint of user-generated files. We exploit the support for the Infiniband network in the Xen virtual machine environment. We have developed a version of the Aggregate Remote Memory Copy Interface (ARMCI) one-sided communication library capable of suspending and resuming applications. We present efficient and scalable mechanisms to distribute checkpoint requests and to backup virtual machines memory images and file systems. We tested our approach in the context of NWChem, a popular computational chemistry suite. We demonstrated that NWChem can be executed, without any modification to the source code, on a virtualized 8-node cluster with very little overhead (below 3%). We observe that the total checkpoint time is limited by disk I/O. Finally, we measured system-size depended components of the checkpoint time on up to 1024 cores (128 nodes), demonstrating the scalability of our approach in medium/large-scale systems.

15 days free trial to Access Article
Conf. Computing Frontiers - Scalable transparent checkpoint-restart of Global Address Space applications on virtual machines over infiniband

Proceedings of the 6th ACM conference on Computing frontiers - CF '09, 2009

Co-Authors: Oreste Villa, Jarek Nieplocha, Sriram Krishnamoorthy, David M Brown

Abstract:

Checkpoint-Restart is one of the most used software approaches to achieve fault-tolerance in high-end clusters. While standard techniques typically focus on user-level solutions, the advent of virtualization software has enabled efficient and transparent system-level approaches. In this paper, we present a scalable transparent system-level solution to Address fault-tolerance for applications based on Global Address Space (GAS) programming models on Infiniband clusters. In addition to handling communication, the solution Addresses transparent checkpoint of user-generated files. We exploit the support for the Infiniband network in the Xen virtual machine environment. We have developed a version of the Aggregate Remote Memory Copy Interface (ARMCI) one-sided communication library capable of suspending and resuming applications. We present efficient and scalable mechanisms to distribute checkpoint requests and to backup virtual machines memory images and file systems. We tested our approach in the context of NWChem, a popular computational chemistry suite. We demonstrated that NWChem can be executed, without any modification to the source code, on a virtualized 8-node cluster with very little overhead (below 3%). We observe that the total checkpoint time is limited by disk I/O. Finally, we measured system-size depended components of the checkpoint time on up to 1024 cores (128 nodes), demonstrating the scalability of our approach in medium/large-scale systems.

15 days free trial to Access Article
A Global Address Space framework for locality aware scheduling of block-sparse computations

2007 IEEE International Parallel and Distributed Processing Symposium, 2007

Co-Authors: Sriram Krishnamoorthy, Jarek Nieplocha, Atanas Rountev, Umit Catalyurek, P. Sadayappan

Abstract:

In this paper, we present a mechanism for automatic management of the memory hierarchy, including secondary storage, in the context of a Global Address Space parallel programming framework. The programmer specifies the parallelism and locality in the computation. The scheduling of the computation into stages, together with the movement of the associated data between secondary storage and Global memory, and between Global memory and local memory, is automatically managed. A novel formulation of hypergraph partitioning is used to model the optimization problem of minimizing disk I/O. Experimental evaluation using a sub-computation from the quantum chemistry domain shows a reduction in the disk I/O cost by up to a factor of 11, and a reduction in turnaround time by up to 49%, as compared to alternative approaches used in state-of-the-art quantum chemistry codes.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

P. Sadayappan - One of the best experts on this subject based on the ideXlab platform.

work stealing for gpu accelerated parallel programs in a Global Address Space framework

HiPC - A Global Address Space approach to automated data management for parallel Quantum Monte Carlo applications

A Global Address Space approach to automated data management for parallel Quantum Monte Carlo applications

CLUSTER - Non-collective parallel I/O for Global Address Space programming models

A Global Address Space framework for locality aware scheduling of block-sparse computations

Katherine Yelick - One of the best experts on this subject based on the ideXlab platform.

ARRAY@PLDI - A Local-View Array Library for Partitioned Global Address Space C++ Programs

a local view array library for partitioned Global Address Space c programs

tuning collective communication for partitioned Global Address Space programming models

porting gasnet to portals partitioned Global Address Space pgas language support for the cray xt

productivity and performance using partitioned Global Address Space languages

Alan D George - One of the best experts on this subject based on the ideXlab platform.

parallel performance wizard a performance system for the analysis of partitioned Global Address Space applications

Parallel performance wizard: A performance analysis tool for partitioned Global-Address-Space programming

IPDPS - Parallel performance wizard: A performance analysis tool for partitioned Global-Address-Space programming

parallel performance wizard a performance analysis tool for partitioned Global Address Space programming models

GASP: A Performance Analysis Tool Interface for Global AddressSpace Programming Models, Version 1.5

Dan Bonachea - One of the best experts on this subject based on the ideXlab platform.

porting gasnet to portals partitioned Global Address Space pgas language support for the cray xt

PASCO - Productivity and performance using partitioned Global Address Space languages

productivity and performance using partitioned Global Address Space languages

automatic nonblocking communication for partitioned Global Address Space programs

ICS - Automatic nonblocking communication for partitioned Global Address Space programs

Sriram Krishnamoorthy - One of the best experts on this subject based on the ideXlab platform.

work stealing for gpu accelerated parallel programs in a Global Address Space framework

performance characterization of Global Address Space applications a case study with nwchem

scalable transparent checkpoint restart of Global Address Space applications on virtual machines over infiniband

Conf. Computing Frontiers - Scalable transparent checkpoint-restart of Global Address Space applications on virtual machines over infiniband

A Global Address Space framework for locality aware scheduling of block-sparse computations

Global Address Space

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Related terms

P. Sadayappan - One of the best experts on this subject based on the ideXlab platform.

Katherine Yelick - One of the best experts on this subject based on the ideXlab platform.

Alan D George - One of the best experts on this subject based on the ideXlab platform.

Dan Bonachea - One of the best experts on this subject based on the ideXlab platform.

Sriram Krishnamoorthy - One of the best experts on this subject based on the ideXlab platform.

Related terms