Cache Behavior

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 9825 Experts worldwide ranked by ideXlab platform

Yan Solihin - One of the best experts on this subject based on the ideXlab platform.

  • west cloning data Cache Behavior using stochastic traces
    High-Performance Computer Architecture, 2012
    Co-Authors: Ganesh Balakrishnan, Yan Solihin
    Abstract:

    Cache designers need an in-depth understanding of end user workloads, but certain end users are apprehensive about sharing code or traces due to the proprietary or confidential nature of code and data. To bridge this gap, Cache designers use a reduced representation of the code (a clone). A promising cloning approach is the black box approach, where workloads are profiled to obtain key statistics, and a clone is automatically generated. Despite its potential, currently there are no highly accurate black box cloning methods for replicating data Cache Behavior. We propose Workload Emulation using Stochastic Traces (WEST), a highly accurate black box cloning technique for replicating data Cache Behavior of arbitrary programs. First, we analyze what profiling statistics are necessary and sufficient to capture a workload. Then, we generate a clone stochastically that produces statistics identical to the proprietary workload. WEST clones can be used in lieu of the workload for exploring Cache sizes, associativities, write policies, replacement policies, Cache hierarchies and co-scheduling, at a significantly reduced simulation time. We use a simple IPC model to control the rate of accesses to the Cache hierarchy. We evaluated WEST using CPU2006 and BioBench suites over a wide Cache design space for single core and dual core CMPs. The clones achieve an average error in miss ratio of only 0.4% across 1394 single core Cache configurations. For co-scheduled mixes, WEST achieves an average error in miss ratio of only 3.1% for over 600 configurations.

  • HPCA - WEST: Cloning data Cache Behavior using Stochastic Traces
    IEEE International Symposium on High-Performance Comp Architecture, 2012
    Co-Authors: Balakrishnan Ganesh, Yan Solihin
    Abstract:

    Cache designers need an in-depth understanding of end user workloads, but certain end users are apprehensive about sharing code or traces due to the proprietary or confidential nature of code and data. To bridge this gap, Cache designers use a reduced representation of the code (a clone). A promising cloning approach is the black box approach, where workloads are profiled to obtain key statistics, and a clone is automatically generated. Despite its potential, currently there are no highly accurate black box cloning methods for replicating data Cache Behavior. We propose Workload Emulation using Stochastic Traces (WEST), a highly accurate black box cloning technique for replicating data Cache Behavior of arbitrary programs. First, we analyze what profiling statistics are necessary and sufficient to capture a workload. Then, we generate a clone stochastically that produces statistics identical to the proprietary workload. WEST clones can be used in lieu of the workload for exploring Cache sizes, associativities, write policies, replacement policies, Cache hierarchies and co-scheduling, at a significantly reduced simulation time. We use a simple IPC model to control the rate of accesses to the Cache hierarchy. We evaluated WEST using CPU2006 and BioBench suites over a wide Cache design space for single core and dual core CMPs. The clones achieve an average error in miss ratio of only 0.4% across 1394 single core Cache configurations. For co-scheduled mixes, WEST achieves an average error in miss ratio of only 3.1% for over 600 configurations.

Reinhard Wilhelm - One of the best experts on this subject based on the ideXlab platform.

  • component wise instruction Cache Behavior prediction
    Automated Technology for Verification and Analysis, 2004
    Co-Authors: Abdur Rakib, Oleg Parshin, Stephan Thesing, Reinhard Wilhelm
    Abstract:

    The precise determination of worst-case execution times (WCETs) for programs is mostly being performed on fully linked executables, since all needed information is available and all machine parameters influencing Cache performance are available to the analysis. This paper describes how to perform a component-wise prediction of the instruction Cache Behavior guaranteeing conservative results compared to an analysis of a fully linked executable. This proves the correctness of the method based on a previous proof of correctness of the analysis of fully linked executables. The analysis is described for a general A-way set associative Cache. The only assumption is that the replacement strategy is LRU.

  • ATVA - Component-Wise Instruction-Cache Behavior Prediction
    Automated Technology for Verification and Analysis, 2004
    Co-Authors: Abdur Rakib, Oleg Parshin, Stephan Thesing, Reinhard Wilhelm
    Abstract:

    The precise determination of worst-case execution times (WCETs) for programs is mostly being performed on fully linked executables, since all needed information is available and all machine parameters influencing Cache performance are available to the analysis. This paper describes how to perform a component-wise prediction of the instruction Cache Behavior guaranteeing conservative results compared to an analysis of a fully linked executable. This proves the correctness of the method based on a previous proof of correctness of the analysis of fully linked executables. The analysis is described for a general A-way set associative Cache. The only assumption is that the replacement strategy is LRU.

  • Efficient and Precise Cache Behavior Prediction for Real-TimeSystems
    Real-time Systems, 1999
    Co-Authors: Christian Ferdinand, Reinhard Wilhelm
    Abstract:

    Abstract interpretation is a technique for the static detection of dynamic properties of programs. It is semantics based, that is, it computes approximative properties of the semantics of programs. On this basis, it supports correctness proofs of analyses. It replaces commonly used ad hoc techniques by systematic, provable ones, and it allows for the automatic generation of analyzers from specifications by existing tools. In this work, abstract interpretation is applied to the problem of predicting the Cache Behavior of programs. Abstract semantics of machine programs are defined which determine the contents of Caches. For interprocedural analysis, existing methods are examined and a new approach that is especially tailored for the Cache analysis is presented. This allows for a static classification of the Cache Behavior of memory references of programs. The calculated information can be used to improve worst case execution time estimations. It is possible to analyze instruction, data, and combined instruction/data Caches for common (re)placement and write strategies. Experimental results are presented that demonstrate the applicability of the analyses.

  • Efficient and Precise Cache Behavior Prediction for Real-Time Systems
    Real-Time Systems, 1999
    Co-Authors: Christian Ferdinand, Reinhard Wilhelm
    Abstract:

    interpretation is a technique for the static detection of dynamic properties of programs. It is semantics based, that is, it computes approximative properties of the semantics of programs. On this basis, it supports correctness proofs of analyses. It replaces commonly used ad hoc techniques by systematic, provable ones, and it allows for the automatic generation of analyzers from specifications by existing tools. In this work, abstract interpretation is applied to the problem of predicting the Cache Behavior of programs. Abstract semantics of machine programs are defined which determine the contents of Caches. For interprocedural analysis, existing methods are examined and a new approach that is especially tailored for the Cache analysis is presented. This allows for a static classification of the Cache Behavior of memory references of programs. The calculated information can be used to improve worst case execution time estimations. It is possible to analyze instruction, data, and combined instruction/data Caches for common (re)placement and write strategies. Experimental results are presented that demonstrate the applicability of the analyses.

  • Cache Behavior prediction by abstract interpretation
    Science of Computer Programming, 1999
    Co-Authors: Christian Ferdinand, Florian Martin, Reinhard Wilhelm, Martin Alt
    Abstract:

    AbstractAbstract interpretation is a technique for the static detection of dynamic properties of programs. It is semantics-based, that is, it computes approximative properties of the semantics of programs. On this basis, it allows for correctness proofs of analyses. It replaces commonly used ad hoc techniques by systematic, provable ones, and it allows the automatic generation of analyzers from specifications as in the Program Analyzer Generator (PAG). In this paper, abstract interpretation is applied to the problem of predicting the Cache Behavior of programs. Abstract semantics of machine programs are defined which determine the contents of Caches. For interprocedural analysis, existing methods are examined and a new approach that is especially tailored for the Cache analysis is presented. This allows for a static classification of the Cache Behavior of memory references of programs. The calculated information can be used to sharpen worst-case execution time estimations. It is possible to analyze instruction, data, and combined instruction/data Caches for common (re)placement and write strategies. Experimental results are presented that demonstrate the applicability of the analysis

Balakrishnan Ganesh - One of the best experts on this subject based on the ideXlab platform.

  • HPCA - WEST: Cloning data Cache Behavior using Stochastic Traces
    IEEE International Symposium on High-Performance Comp Architecture, 2012
    Co-Authors: Balakrishnan Ganesh, Yan Solihin
    Abstract:

    Cache designers need an in-depth understanding of end user workloads, but certain end users are apprehensive about sharing code or traces due to the proprietary or confidential nature of code and data. To bridge this gap, Cache designers use a reduced representation of the code (a clone). A promising cloning approach is the black box approach, where workloads are profiled to obtain key statistics, and a clone is automatically generated. Despite its potential, currently there are no highly accurate black box cloning methods for replicating data Cache Behavior. We propose Workload Emulation using Stochastic Traces (WEST), a highly accurate black box cloning technique for replicating data Cache Behavior of arbitrary programs. First, we analyze what profiling statistics are necessary and sufficient to capture a workload. Then, we generate a clone stochastically that produces statistics identical to the proprietary workload. WEST clones can be used in lieu of the workload for exploring Cache sizes, associativities, write policies, replacement policies, Cache hierarchies and co-scheduling, at a significantly reduced simulation time. We use a simple IPC model to control the rate of accesses to the Cache hierarchy. We evaluated WEST using CPU2006 and BioBench suites over a wide Cache design space for single core and dual core CMPs. The clones achieve an average error in miss ratio of only 0.4% across 1394 single core Cache configurations. For co-scheduled mixes, WEST achieves an average error in miss ratio of only 3.1% for over 600 configurations.

Chun Jason Xue - One of the best experts on this subject based on the ideXlab platform.

  • Joint task assignment and Cache partitioning with Cache locking for WCET minimization on MPSoC
    Journal of Parallel and Distributed Computing, 2011
    Co-Authors: Tiantian Liu, Yingchao Zhao, Chun Jason Xue
    Abstract:

    Cache locking technique is often utilized to guarantee a tighter prediction of Worst-Case Execution Time (WCET) which is one of the most important performance metrics for embedded systems. However, in Multi-Processor Systems-on-Chip (MPSoC) systems with multi-tasks, Level 2 (L2) Cache is often shared among different tasks and cores, which leads to extended unpredictability of Cache. Task assignment has inherent relevancy for Cache Behavior, while Cache Behavior also affects the efficiency of task assignment. Task assignment and Cache Behavior have dramatic influences on the overall WCET of MPSoC. This paper proposes joint task assignment and Cache partitioning techniques to minimize the overall WCET for MPSoC systems. Cache locking is applied to each task to guarantee a precise WCET. We prove that the joint problem is NP-hard and propose several efficient algorithms. Experimental results show that the proposed algorithms can consistently reduce the overall WCET compared to previous techniques.

  • ICPP - Task Assignment with Cache Partitioning and Locking for WCET Minimization on MPSoC
    2010 39th International Conference on Parallel Processing, 2010
    Co-Authors: Tiantian Liu, Yingchao Zhao, Chun Jason Xue
    Abstract:

    Cache is known for its unpredictability in embedded systems. Cache locking technique is often utilized to guarantee a tighter prediction of Worst-Case Execution Time (WCET) which is one of the most important performance metrics for embedded systems. However, in Multi-Processor Systems-on-Chip (MPSoC) systems with multi-tasks, Level 2 (L2) Cache is often shared among different tasks and cores, which leads to higher complexity in the Cache management and extended unpredictability of Cache. Task assignment has inherent relevancy for Cache Behavior, while Cache Behavior also affects the efficiency of task assignment. Task assignment and Cache Behavior have dramatic influences on the overall WCET of MPSoC. In this paper, overall WCET represents the worst-case finishing time of a set of tasks running on different cores. This paper proposes joint task assignment and Cache partitioning techniques to minimize the overall WCET for MPSoC systems. Cache locking is applied to each task to guarantee a precise WCET, which in return facilitates task assignment and Cache partitioning. We prove that the joint problem is NP-Hard and propose several efficient algorithms. Experimental results show that the proposed algorithms can consistently reduce the overall WCET compared to previous techniques.

Philip Machanick - One of the best experts on this subject based on the ideXlab platform.

  • restructuring a parallel simulation to improve Cache Behavior in a shared memory multiprocessor the value of distributed synchronization
    Workshop on Parallel and Distributed Simulation, 1993
    Co-Authors: David R. Cheriton, Hendrik A. Goosen, Hugh W Holbrook, Philip Machanick
    Abstract:

    Synchronization is a significant cost in many parallel programs, and can be a major bottleneck if it is handled in a centralized fashion using traditional shared-memory constructs such as barriers. In a parallel time-stepped simulation, the use of global synchronization primitives limits scalability, increases the sensitivity to load imbalance, and reduces the potential for exploiting locality to improve Cache Behavior. This paper presents the results of an initial one-application study quantifying the costs and performance benefits of distributed, nearest neighbors synchronization. The application studied, MP3D, is a particle-based wind tunnel simulation. Our results for this one application on current shared-memory multiprocessors show a significant decrease in synchronization time using these techniques. We prototyped an application-independent library that implements distributed synchronization. The library allows a variety of parallel simulations to exploit these techniques without increasing the application programming beyond that of conventional approaches.

  • Restructuring a parallel simulation to improve Cache Behavior in a shared-memory multiprocessor
    ACM SIGSIM Simulation Digest, 1993
    Co-Authors: David R. Cheriton, Hendrik A. Goosen, Hugh Holbrook, Philip Machanick
    Abstract:

    Synchronization is a significant cost in many parallel programs, and can be a major bottleneck if it is handled in a centralized fashion using traditional shared-memory constructs such as barriers. In a parallel time-stepped simulation, the use of global synchronization primitives limits scalability, increases the sensitivity to load imbalance, and reduces the potential for exploiting locality to improve Cache Behavior. This paper presents the results of an initial one-application study quantifying the costs and performance benefits of distributed, nearest neighbors synchronization. The application studied, MP3D, is a particle-based wind tunnel simulation. Our results for this one application on current shared-memory multiprocessors show a significant decrease in synchronization time using these techniques. We prototyped an application-independent library that implements distributed synchronization. The library allows a variety of parallel simulations to exploit these techniques without increasing the application programming beyond that of conventional approaches.