Parallel Application

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 22035 Experts worldwide ranked by ideXlab platform

Shankar Balachandran - One of the best experts on this subject based on the ideXlab platform.

  • PACT - XStream: cross-core spatial streaming based MLC prefetchers for Parallel Applications in CMPs
    Proceedings of the 23rd international conference on Parallel architectures and compilation - PACT '14, 2014
    Co-Authors: Biswabandan Panda, Shankar Balachandran
    Abstract:

    Hardware prefetchers are commonly used to hide and tolerate off-chip memory latency. Prefetching techniques in the literature are designed for multiple independent sequential Applications running on a multicore system. In contrast to multiple independent Applications, a single Parallel Application running on a multicore system exhibits different behavior. In case of a Parallel Application, cores share and communicate data and code among themselves, and there is commonality in the demand miss streams across multiple cores. This gives an opportunity to predict the demand miss streams and communicate the predicted streams from one core to another, which we refer as cross-core stream communication. We propose cross-core spatial streaming (XStream), a practical and storage-efficient cross-core prefetching technique. XStream detects and predicts the cross-core spatial streams at the private mid level caches (MLCs) and sends the predicted streams in advance to MLC prefetchers of the predicted cores. We compare the effectiveness of XStream with the ideal cross-core spatial streamer. Experimental results demonstrate that, on an average (geomean), compared to the state-of-the-art spatial memory streaming, storage efficient XStream reduces the execution time by 11.3% (as high as 24%) and 9% (as high as 29.09%) for 4-core and 8-core systems respectively.

  • XStream: Cross-core spatial streaming based MLC prefetchers for Parallel Applications in CMPs
    2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT), 2014
    Co-Authors: Biswabandan Panda, Shankar Balachandran
    Abstract:

    Hardware prefetchers are commonly used to hide and tolerate off-chip memory latency. Prefetching techniques in the literature are designed for multiple independent sequential Applications running on a multicore system. In contrast to multiple independent Applications, a single Parallel Application running on a multicore system exhibits different behavior. In case of a Parallel Application, cores share and communicate data and code among themselves, and there is commonality in the demand miss streams across multiple cores. This gives an opportunity to predict the demand miss streams and communicate the predicted streams from one core to another, which we refer as cross-core stream communication. We propose cross-core spatial streaming (XStream), a practical and storage-efficient cross-core prefetching technique. XStream detects and predicts the cross-core spatial streams at the private mid level caches (MLCs) and sends the predicted streams in advance to MLC prefetchers of the predicted cores. We compare the effectiveness of XStream with the ideal cross-core spatial streamer. Experimental results demonstrate that, on an average (geomean), compared to the state-of-the-art spatial memory streaming, storage efficient XStream reduces the execution time by 11.3% (as high as 24%) and 9% (as high as 29.09%) for 4-core and 8-core systems respectively.

Cynthia S. Hood - One of the best experts on this subject based on the ideXlab platform.

  • Modeling Parallel Application sensitivity to network performance
    2020
    Co-Authors: Cynthia S. Hood, Jeffrey J. Evans
    Abstract:

    Highly variable Parallel Application execution time is a persistent issue in cluster computing environments, and can be particularly acute in systems composed of Networks of Workstations (NOWs). Performance modeling and management in these computing environments has focused on performance optimization of a single subsystem or Application, often on a single system. This work focuses on network performance and uses techniques from fault management to define systemic performance consistency. The goal of this research is to characterize Parallel Application sensitivity- to network performance and develop a strategy for its use. The method developed, called "Parallel Application Run time Sensitivity Evaluation" (PARSE), uses the "Parallel Application Communication Emulation" (PACE) framework to identify Application run time sensitivity to network performance degradation. When used together, PARSE and PACE can characterize and evaluate a, Parallel Application without the need to instrument it. Results demonstrate how PARSE and PACE expose and quantify run time sensitivity to network performance degradation. This work also defines a continuous variable sensitivity factor and demonstrates how Application run time statistics influenced by PACE can be used to quantify it. The sensitivity factor is independent of Application and considers changes in the coefficients of mean and variation. The characterization of Application sensitivity can be used to set network performance goals, thereby defining soft faults. Network performance also depends on the virtual topology imposed by the scheduler's allocation of nodes and the communication patterns of the set of all running Applications. The sensitivity factor can be used strategically by other subsystems to maintain consistent systemic performance. It can also be used to aid in program design and tuning.

  • A network performance sensitivity metric for Parallel Applications
    International Journal of High Performance Computing and Networking, 2011
    Co-Authors: Jeffrey J. Evans, Cynthia S. Hood
    Abstract:

    Excessive run time variability of Parallel Application codes on commodity clusters is a significant challenge. To gain insight into this problem, our earlier work developed tools to emulate Parallel Applications (PACE) by simulating computation and using the cluster's interconnection network for communication, and further study Parallel Application run time sensitivity effects to controlled network performance degradation (PARSE). This work expands our previous efforts by presenting a metric derived from PARSE test results conducted on several widely used Parallel benchmarks and Application code fragments. The metric suggests that a Parallel Application's sensitivity to network performance variation can be quantified relative to its behaviour in optimal network performance conditions. Ideas on how this metric can be useful to Parallel Application development, cluster system performance management and system administration are also presented.

  • ISPA - A network performance sensitivity metric for Parallel Applications
    Parallel and Distributed Processing and Applications, 2007
    Co-Authors: Jeffrey J. Evans, Cynthia S. Hood
    Abstract:

    Excessive run time variability of Parallel Application codes on commodity clusters is a significant challenge. To gain insight into this problem our earlier work developed a tools to emulate Parallel Applications (PACE) by simulating computation and using the cluster's interconnection network for communication, and further study Parallel Application run time effects (PARSE). This work expands our previous efforts by presenting a metric derived from PARSE test results conducted on several widely used Parallel benchmarks and Application code fragments. The metric suggests that a Parallel Application's sensitivity to network performance variation can be quantified relative to its behavior in optimal network performance conditions. Ideas on how this metric can be useful to Parallel Application development, cluster system performance management and system administration are also presented.

  • PARSE: a tool for Parallel Application run time sensitivity evaluation
    12th International Conference on Parallel and Distributed Systems - (ICPADS'06), 2006
    Co-Authors: Jeffrey J. Evans, Cynthia S. Hood
    Abstract:

    Run time variability of Parallel Application codes continues to be a significant challenge in clusters. We are studying run time variability at the communication level from the perspective of the Application, focusing on the network. To gain insight into this problem our earlier work developed a tool to emulate Parallel Applications and in particular their communication. This framework, called Parallel Application communication emulation (PACE) has produced interesting insights regarding network performance in NOW clusters. A Parallel Application run time sensitivity evaluation (PARSE) function has been added to the PACE framework to study the run time effects of controlled network performance degradation. This paper introduces PARSE and presents experimental results from tests conducted on several widely used Parallel benchmarks and Application code fragments. The results suggest that Parallel Applications can be classified in terms of their sensitivity to network performance variation

  • ICPADS (1) - PARSE: a tool for Parallel Application run time sensitivity evaluation
    12th International Conference on Parallel and Distributed Systems - (ICPADS'06), 2006
    Co-Authors: Jeffrey J. Evans, Cynthia S. Hood
    Abstract:

    Run time variability of Parallel Application codes continues to be a significant challenge in clusters. We are studying run time variability at the communication level from the perspective of the Application, focusing on the network. To gain insight into this problem our earlier work developed a tool to emulate Parallel Applications and in particular their communication. This framework, called Parallel Application communication emulation (PACE) has produced interesting insights regarding network performance in NOW clusters. A Parallel Application run time sensitivity evaluation (PARSE) function has been added to the PACE framework to study the run time effects of controlled network performance degradation. This paper introduces PARSE and presents experimental results from tests conducted on several widely used Parallel benchmarks and Application code fragments. The results suggest that Parallel Applications can be classified in terms of their sensitivity to network performance variation.

Biswabandan Panda - One of the best experts on this subject based on the ideXlab platform.

  • PACT - XStream: cross-core spatial streaming based MLC prefetchers for Parallel Applications in CMPs
    Proceedings of the 23rd international conference on Parallel architectures and compilation - PACT '14, 2014
    Co-Authors: Biswabandan Panda, Shankar Balachandran
    Abstract:

    Hardware prefetchers are commonly used to hide and tolerate off-chip memory latency. Prefetching techniques in the literature are designed for multiple independent sequential Applications running on a multicore system. In contrast to multiple independent Applications, a single Parallel Application running on a multicore system exhibits different behavior. In case of a Parallel Application, cores share and communicate data and code among themselves, and there is commonality in the demand miss streams across multiple cores. This gives an opportunity to predict the demand miss streams and communicate the predicted streams from one core to another, which we refer as cross-core stream communication. We propose cross-core spatial streaming (XStream), a practical and storage-efficient cross-core prefetching technique. XStream detects and predicts the cross-core spatial streams at the private mid level caches (MLCs) and sends the predicted streams in advance to MLC prefetchers of the predicted cores. We compare the effectiveness of XStream with the ideal cross-core spatial streamer. Experimental results demonstrate that, on an average (geomean), compared to the state-of-the-art spatial memory streaming, storage efficient XStream reduces the execution time by 11.3% (as high as 24%) and 9% (as high as 29.09%) for 4-core and 8-core systems respectively.

  • XStream: Cross-core spatial streaming based MLC prefetchers for Parallel Applications in CMPs
    2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT), 2014
    Co-Authors: Biswabandan Panda, Shankar Balachandran
    Abstract:

    Hardware prefetchers are commonly used to hide and tolerate off-chip memory latency. Prefetching techniques in the literature are designed for multiple independent sequential Applications running on a multicore system. In contrast to multiple independent Applications, a single Parallel Application running on a multicore system exhibits different behavior. In case of a Parallel Application, cores share and communicate data and code among themselves, and there is commonality in the demand miss streams across multiple cores. This gives an opportunity to predict the demand miss streams and communicate the predicted streams from one core to another, which we refer as cross-core stream communication. We propose cross-core spatial streaming (XStream), a practical and storage-efficient cross-core prefetching technique. XStream detects and predicts the cross-core spatial streams at the private mid level caches (MLCs) and sends the predicted streams in advance to MLC prefetchers of the predicted cores. We compare the effectiveness of XStream with the ideal cross-core spatial streamer. Experimental results demonstrate that, on an average (geomean), compared to the state-of-the-art spatial memory streaming, storage efficient XStream reduces the execution time by 11.3% (as high as 24%) and 9% (as high as 29.09%) for 4-core and 8-core systems respectively.

Jeffrey J. Evans - One of the best experts on this subject based on the ideXlab platform.

  • Modeling Parallel Application sensitivity to network performance
    2020
    Co-Authors: Cynthia S. Hood, Jeffrey J. Evans
    Abstract:

    Highly variable Parallel Application execution time is a persistent issue in cluster computing environments, and can be particularly acute in systems composed of Networks of Workstations (NOWs). Performance modeling and management in these computing environments has focused on performance optimization of a single subsystem or Application, often on a single system. This work focuses on network performance and uses techniques from fault management to define systemic performance consistency. The goal of this research is to characterize Parallel Application sensitivity- to network performance and develop a strategy for its use. The method developed, called "Parallel Application Run time Sensitivity Evaluation" (PARSE), uses the "Parallel Application Communication Emulation" (PACE) framework to identify Application run time sensitivity to network performance degradation. When used together, PARSE and PACE can characterize and evaluate a, Parallel Application without the need to instrument it. Results demonstrate how PARSE and PACE expose and quantify run time sensitivity to network performance degradation. This work also defines a continuous variable sensitivity factor and demonstrates how Application run time statistics influenced by PACE can be used to quantify it. The sensitivity factor is independent of Application and considers changes in the coefficients of mean and variation. The characterization of Application sensitivity can be used to set network performance goals, thereby defining soft faults. Network performance also depends on the virtual topology imposed by the scheduler's allocation of nodes and the communication patterns of the set of all running Applications. The sensitivity factor can be used strategically by other subsystems to maintain consistent systemic performance. It can also be used to aid in program design and tuning.

  • PARSE 2.0: A Tool for Parallel Application Run Time Behavior Evaluation
    2011 31st International Conference on Distributed Computing Systems Workshops, 2011
    Co-Authors: Jeffrey J. Evans, Charles E. Lucas
    Abstract:

    Run time variability of Parallel Applications continues to be a significant challenge in high performance computing (HPC) systems. We are currently studying run time variability in the context of both systemic performance and energy management. Our perspective is from that of the Application, focusing on the interactions of the inter-process communication system on the set of concurrently executing Parallel Applications. In such a scenario, Application run time can be extended and become highly variable. While some Applications may be more sensitive to these interactions, others may in fact be generating the interactions that cause inconsistent run time, thus forming the notion of Application-level behavioral attributes. To gain insight into this problem, our earlier work developed a framework that emulates Parallel Applications, called PACE. We also introduced a Parallel Application Run time Sensitivity Evaluation (PARSE) function that uses the PACE framework to study the run time effects of controlled network performance degradation on Applications. Inter-process communication has evolved over the last decade from network communication between single-processor, single-core nodes to hybrid systems whose compute nodes contain several multi-core processor units. Motivated by the evolution of compute hardware and systems software, this work introduces PARSE 2.0, which is a nearly complete re-write that extends PARSE capabilities to include fully automating the processes of evaluating and quantifying run time critical Parallel Application-level behavioral attributes. We present an overview of the tool and the attributes being evaluated, and present experimental results from tests conducted on several widely used Parallel benchmarks and Application code fragments. The results re-enforce our earlier work, demonstrating that Parallel Applications can be classified according to their behavioral attributes, in the context of communication system resources.

  • A network performance sensitivity metric for Parallel Applications
    International Journal of High Performance Computing and Networking, 2011
    Co-Authors: Jeffrey J. Evans, Cynthia S. Hood
    Abstract:

    Excessive run time variability of Parallel Application codes on commodity clusters is a significant challenge. To gain insight into this problem, our earlier work developed tools to emulate Parallel Applications (PACE) by simulating computation and using the cluster's interconnection network for communication, and further study Parallel Application run time sensitivity effects to controlled network performance degradation (PARSE). This work expands our previous efforts by presenting a metric derived from PARSE test results conducted on several widely used Parallel benchmarks and Application code fragments. The metric suggests that a Parallel Application's sensitivity to network performance variation can be quantified relative to its behaviour in optimal network performance conditions. Ideas on how this metric can be useful to Parallel Application development, cluster system performance management and system administration are also presented.

  • ISPA - A network performance sensitivity metric for Parallel Applications
    Parallel and Distributed Processing and Applications, 2007
    Co-Authors: Jeffrey J. Evans, Cynthia S. Hood
    Abstract:

    Excessive run time variability of Parallel Application codes on commodity clusters is a significant challenge. To gain insight into this problem our earlier work developed a tools to emulate Parallel Applications (PACE) by simulating computation and using the cluster's interconnection network for communication, and further study Parallel Application run time effects (PARSE). This work expands our previous efforts by presenting a metric derived from PARSE test results conducted on several widely used Parallel benchmarks and Application code fragments. The metric suggests that a Parallel Application's sensitivity to network performance variation can be quantified relative to its behavior in optimal network performance conditions. Ideas on how this metric can be useful to Parallel Application development, cluster system performance management and system administration are also presented.

  • PARSE: a tool for Parallel Application run time sensitivity evaluation
    12th International Conference on Parallel and Distributed Systems - (ICPADS'06), 2006
    Co-Authors: Jeffrey J. Evans, Cynthia S. Hood
    Abstract:

    Run time variability of Parallel Application codes continues to be a significant challenge in clusters. We are studying run time variability at the communication level from the perspective of the Application, focusing on the network. To gain insight into this problem our earlier work developed a tool to emulate Parallel Applications and in particular their communication. This framework, called Parallel Application communication emulation (PACE) has produced interesting insights regarding network performance in NOW clusters. A Parallel Application run time sensitivity evaluation (PARSE) function has been added to the PACE framework to study the run time effects of controlled network performance degradation. This paper introduces PARSE and presents experimental results from tests conducted on several widely used Parallel benchmarks and Application code fragments. The results suggest that Parallel Applications can be classified in terms of their sensitivity to network performance variation

Emilio Luque - One of the best experts on this subject based on the ideXlab platform.

  • Parallel Application Signature for Performance Analysis and Prediction
    IEEE Transactions on Parallel and Distributed Systems, 2015
    Co-Authors: A. Wong, Dolores Rexachs, Emilio Luque
    Abstract:

    Predicting the performance of Parallel scientific Applications is becoming increasingly complex. Our goal was to characterize the behavior of message-passing Applications on different target machines. To achieve this goal, we developed a method called Parallel Application signature for performance prediction (PAS2P), which strives to describe an Application based on its behavior. Based on the Application's message-passing activity, we identified and extracted representative phases, with which we created a Parallel Application signature that enabled us to predict the Application's performance. We experimented with using different scientific Applications on different clusters. We were able to predict execution times with an average accuracy greater than 97 percent.

  • POSTER: “Analysis of scalability: A Parallel Application model approach”
    2014 IEEE International Conference on Cluster Computing (CLUSTER), 2014
    Co-Authors: Javier Panadero, Dolores Rexachs, A. Wong, Emilio Luque
    Abstract:

    In this paper we propose a methodology that allows us to predict the Application scalability behavior in a specific system, providing information to select the most appropriate resources to run the Application. We explain the general methodology, focusing on the presentation of a novel method to model the logical Application trace for a large number of processes. This method is based on the projection of a set of executions of the Application signature for a small number of processes. The generated traces are validated by comparing them with the real traces obtained with PAS2P tool. We present the experimental validation for the BT Nas Parallel Benchmark. The signatures for 16, 36, 64, 81 and 100 processes were executed and used to model and project the logical trace for 1024 processes. The results obtained show the accuracy of the method. The communication pattern was predicted without error, while the predicted error is less than 10% for the communication volume and less than 5% for the number of instructions.

  • HPCC - Extraction of Parallel Application Signatures for Performance Prediction
    2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC), 2010
    Co-Authors: A. Wong, Dolores Rexachs, Emilio Luque
    Abstract:

    Predicting performance of Parallel Applications is becoming increasingly complex and the best performance predictor is the Application itself, but the time required to run it thoroughly is a onerous requirement. We seek to characterize the behavior of message-passing Applications on different systems by extracting a signature which will allow us to predict what system will allow the Application to perform best. To achieve this goal, we have developed a method we called Parallel Application Signatures for Performance Prediction (PAS2P) that strives to describe an Application based on its behavior. Based on the Application’s message-passing activity, we have been able to identify and extract representative phases, with which we created a Parallel Application Signature that has allowed us to predict the Application’s performance. We have experimented with different signature-extraction algorithms and found a reduction in the prediction error using different scientific Applications on different clusters. We were able to predict execution times with an average accuracy of over 98%.

  • Extraction of Parallel Application Signatures for Performance Prediction
    2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC), 2010
    Co-Authors: A. Wong, Dolores Rexachs, Emilio Luque
    Abstract:

    Predicting performance of Parallel Applications is becoming increasingly complex and the best performance predictor is the Application itself, but the time required to run it thoroughly is a onerous requirement. We seek to characterize the behavior of message-passing Applications on different systems by extracting a signature which will allow us to predict what system will allow the Application to perform best. To achieve this goal, we have developed a method we called Parallel Application Signatures for Performance Prediction (PAS2P) that strives to describe an Application based on its behavior. Based on the Application's message-passing activity, we have been able to identify and extract representative phases, with which we created a Parallel Application Signature that has allowed us to predict the Application's performance. We have experimented with different signature-extraction algorithms and found a reduction in the prediction error using different scientific Applications on different clusters. We were able to predict execution times with an average accuracy of over 98%.

  • A Performance Tuning Strategy for Complex Parallel Application
    2010 18th Euromicro Conference on Parallel Distributed and Network-based Processing, 2010
    Co-Authors: Jose Alexander Guevara, Eduardo Cesar, Joan Sorribes, Andreu Moreno, Tomàs Margalef, Emilio Luque
    Abstract:

    Defining performance models associated with the Application structure has been proven a useful strategy for implementing dynamic tuning tools. However, for extending this strategy to more complex Applications (those composed by different structures) it must integrate a policy for the distribution of the resources among the different Application components. Consequently, we propose to take advantage of the knowledge of these models and combine them with a resource management policy for obtaining a global model. In this sense, this work constitutes the ongoing effort in the development of performance models for dynamic tuning.