Cache Behavior - Explore the Science & Experts

The Experts below are selected from a list of 9825 Experts worldwide ranked by ideXlab platform

Yan Solihin - One of the best experts on this subject based on the ideXlab platform.

west cloning data Cache Behavior using stochastic traces

High-Performance Computer Architecture, 2012

Co-Authors: Ganesh Balakrishnan, Yan Solihin

Abstract:

Cache designers need an in-depth understanding of end user workloads, but certain end users are apprehensive about sharing code or traces due to the proprietary or confidential nature of code and data. To bridge this gap, Cache designers use a reduced representation of the code (a clone). A promising cloning approach is the black box approach, where workloads are profiled to obtain key statistics, and a clone is automatically generated. Despite its potential, currently there are no highly accurate black box cloning methods for replicating data Cache Behavior. We propose Workload Emulation using Stochastic Traces (WEST), a highly accurate black box cloning technique for replicating data Cache Behavior of arbitrary programs. First, we analyze what profiling statistics are necessary and sufficient to capture a workload. Then, we generate a clone stochastically that produces statistics identical to the proprietary workload. WEST clones can be used in lieu of the workload for exploring Cache sizes, associativities, write policies, replacement policies, Cache hierarchies and co-scheduling, at a significantly reduced simulation time. We use a simple IPC model to control the rate of accesses to the Cache hierarchy. We evaluated WEST using CPU2006 and BioBench suites over a wide Cache design space for single core and dual core CMPs. The clones achieve an average error in miss ratio of only 0.4% across 1394 single core Cache configurations. For co-scheduled mixes, WEST achieves an average error in miss ratio of only 3.1% for over 600 configurations.

15 days free trial to Access Article
HPCA - WEST: Cloning data Cache Behavior using Stochastic Traces

IEEE International Symposium on High-Performance Comp Architecture, 2012

Co-Authors: Balakrishnan Ganesh, Yan Solihin

Abstract:

Cache designers need an in-depth understanding of end user workloads, but certain end users are apprehensive about sharing code or traces due to the proprietary or confidential nature of code and data. To bridge this gap, Cache designers use a reduced representation of the code (a clone). A promising cloning approach is the black box approach, where workloads are profiled to obtain key statistics, and a clone is automatically generated. Despite its potential, currently there are no highly accurate black box cloning methods for replicating data Cache Behavior. We propose Workload Emulation using Stochastic Traces (WEST), a highly accurate black box cloning technique for replicating data Cache Behavior of arbitrary programs. First, we analyze what profiling statistics are necessary and sufficient to capture a workload. Then, we generate a clone stochastically that produces statistics identical to the proprietary workload. WEST clones can be used in lieu of the workload for exploring Cache sizes, associativities, write policies, replacement policies, Cache hierarchies and co-scheduling, at a significantly reduced simulation time. We use a simple IPC model to control the rate of accesses to the Cache hierarchy. We evaluated WEST using CPU2006 and BioBench suites over a wide Cache design space for single core and dual core CMPs. The clones achieve an average error in miss ratio of only 0.4% across 1394 single core Cache configurations. For co-scheduled mixes, WEST achieves an average error in miss ratio of only 3.1% for over 600 configurations.

15 days free trial to Access Article

Reinhard Wilhelm - One of the best experts on this subject based on the ideXlab platform.

component wise instruction Cache Behavior prediction

Automated Technology for Verification and Analysis, 2004

Co-Authors: Abdur Rakib, Oleg Parshin, Stephan Thesing, Reinhard Wilhelm

Abstract:

The precise determination of worst-case execution times (WCETs) for programs is mostly being performed on fully linked executables, since all needed information is available and all machine parameters influencing Cache performance are available to the analysis. This paper describes how to perform a component-wise prediction of the instruction Cache Behavior guaranteeing conservative results compared to an analysis of a fully linked executable. This proves the correctness of the method based on a previous proof of correctness of the analysis of fully linked executables. The analysis is described for a general A-way set associative Cache. The only assumption is that the replacement strategy is LRU.

15 days free trial to Access Article
ATVA - Component-Wise Instruction-Cache Behavior Prediction

Automated Technology for Verification and Analysis, 2004

Co-Authors: Abdur Rakib, Oleg Parshin, Stephan Thesing, Reinhard Wilhelm

Abstract:

The precise determination of worst-case execution times (WCETs) for programs is mostly being performed on fully linked executables, since all needed information is available and all machine parameters influencing Cache performance are available to the analysis. This paper describes how to perform a component-wise prediction of the instruction Cache Behavior guaranteeing conservative results compared to an analysis of a fully linked executable. This proves the correctness of the method based on a previous proof of correctness of the analysis of fully linked executables. The analysis is described for a general A-way set associative Cache. The only assumption is that the replacement strategy is LRU.

15 days free trial to Access Article
Efficient and Precise Cache Behavior Prediction for Real-TimeSystems

Real-time Systems, 1999

Co-Authors: Christian Ferdinand, Reinhard Wilhelm

Abstract:

Abstract interpretation is a technique for the static detection of dynamic properties of programs. It is semantics based, that is, it computes approximative properties of the semantics of programs. On this basis, it supports correctness proofs of analyses. It replaces commonly used ad hoc techniques by systematic, provable ones, and it allows for the automatic generation of analyzers from specifications by existing tools. In this work, abstract interpretation is applied to the problem of predicting the Cache Behavior of programs. Abstract semantics of machine programs are defined which determine the contents of Caches. For interprocedural analysis, existing methods are examined and a new approach that is especially tailored for the Cache analysis is presented. This allows for a static classification of the Cache Behavior of memory references of programs. The calculated information can be used to improve worst case execution time estimations. It is possible to analyze instruction, data, and combined instruction/data Caches for common (re)placement and write strategies. Experimental results are presented that demonstrate the applicability of the analyses.

15 days free trial to Access Article
Efficient and Precise Cache Behavior Prediction for Real-Time Systems

Real-Time Systems, 1999

Co-Authors: Christian Ferdinand, Reinhard Wilhelm

Abstract:

interpretation is a technique for the static detection of dynamic properties of programs. It is semantics based, that is, it computes approximative properties of the semantics of programs. On this basis, it supports correctness proofs of analyses. It replaces commonly used ad hoc techniques by systematic, provable ones, and it allows for the automatic generation of analyzers from specifications by existing tools. In this work, abstract interpretation is applied to the problem of predicting the Cache Behavior of programs. Abstract semantics of machine programs are defined which determine the contents of Caches. For interprocedural analysis, existing methods are examined and a new approach that is especially tailored for the Cache analysis is presented. This allows for a static classification of the Cache Behavior of memory references of programs. The calculated information can be used to improve worst case execution time estimations. It is possible to analyze instruction, data, and combined instruction/data Caches for common (re)placement and write strategies. Experimental results are presented that demonstrate the applicability of the analyses.

15 days free trial to Access Article
Cache Behavior prediction by abstract interpretation

Science of Computer Programming, 1999

Co-Authors: Christian Ferdinand, Florian Martin, Reinhard Wilhelm, Martin Alt

Abstract:

AbstractAbstract interpretation is a technique for the static detection of dynamic properties of programs. It is semantics-based, that is, it computes approximative properties of the semantics of programs. On this basis, it allows for correctness proofs of analyses. It replaces commonly used ad hoc techniques by systematic, provable ones, and it allows the automatic generation of analyzers from specifications as in the Program Analyzer Generator (PAG). In this paper, abstract interpretation is applied to the problem of predicting the Cache Behavior of programs. Abstract semantics of machine programs are defined which determine the contents of Caches. For interprocedural analysis, existing methods are examined and a new approach that is especially tailored for the Cache analysis is presented. This allows for a static classification of the Cache Behavior of memory references of programs. The calculated information can be used to sharpen worst-case execution time estimations. It is possible to analyze instruction, data, and combined instruction/data Caches for common (re)placement and write strategies. Experimental results are presented that demonstrate the applicability of the analysis

15 days free trial to Access Article

Balakrishnan Ganesh - One of the best experts on this subject based on the ideXlab platform.

HPCA - WEST: Cloning data Cache Behavior using Stochastic Traces

IEEE International Symposium on High-Performance Comp Architecture, 2012

Co-Authors: Balakrishnan Ganesh, Yan Solihin

Abstract:

Cache designers need an in-depth understanding of end user workloads, but certain end users are apprehensive about sharing code or traces due to the proprietary or confidential nature of code and data. To bridge this gap, Cache designers use a reduced representation of the code (a clone). A promising cloning approach is the black box approach, where workloads are profiled to obtain key statistics, and a clone is automatically generated. Despite its potential, currently there are no highly accurate black box cloning methods for replicating data Cache Behavior. We propose Workload Emulation using Stochastic Traces (WEST), a highly accurate black box cloning technique for replicating data Cache Behavior of arbitrary programs. First, we analyze what profiling statistics are necessary and sufficient to capture a workload. Then, we generate a clone stochastically that produces statistics identical to the proprietary workload. WEST clones can be used in lieu of the workload for exploring Cache sizes, associativities, write policies, replacement policies, Cache hierarchies and co-scheduling, at a significantly reduced simulation time. We use a simple IPC model to control the rate of accesses to the Cache hierarchy. We evaluated WEST using CPU2006 and BioBench suites over a wide Cache design space for single core and dual core CMPs. The clones achieve an average error in miss ratio of only 0.4% across 1394 single core Cache configurations. For co-scheduled mixes, WEST achieves an average error in miss ratio of only 3.1% for over 600 configurations.

15 days free trial to Access Article

Chun Jason Xue - One of the best experts on this subject based on the ideXlab platform.

Joint task assignment and Cache partitioning with Cache locking for WCET minimization on MPSoC

Journal of Parallel and Distributed Computing, 2011

Co-Authors: Tiantian Liu, Yingchao Zhao, Chun Jason Xue

Abstract:

Cache locking technique is often utilized to guarantee a tighter prediction of Worst-Case Execution Time (WCET) which is one of the most important performance metrics for embedded systems. However, in Multi-Processor Systems-on-Chip (MPSoC) systems with multi-tasks, Level 2 (L2) Cache is often shared among different tasks and cores, which leads to extended unpredictability of Cache. Task assignment has inherent relevancy for Cache Behavior, while Cache Behavior also affects the efficiency of task assignment. Task assignment and Cache Behavior have dramatic influences on the overall WCET of MPSoC. This paper proposes joint task assignment and Cache partitioning techniques to minimize the overall WCET for MPSoC systems. Cache locking is applied to each task to guarantee a precise WCET. We prove that the joint problem is NP-hard and propose several efficient algorithms. Experimental results show that the proposed algorithms can consistently reduce the overall WCET compared to previous techniques.

15 days free trial to Access Article
ICPP - Task Assignment with Cache Partitioning and Locking for WCET Minimization on MPSoC

2010 39th International Conference on Parallel Processing, 2010

Co-Authors: Tiantian Liu, Yingchao Zhao, Chun Jason Xue

Abstract:

Cache is known for its unpredictability in embedded systems. Cache locking technique is often utilized to guarantee a tighter prediction of Worst-Case Execution Time (WCET) which is one of the most important performance metrics for embedded systems. However, in Multi-Processor Systems-on-Chip (MPSoC) systems with multi-tasks, Level 2 (L2) Cache is often shared among different tasks and cores, which leads to higher complexity in the Cache management and extended unpredictability of Cache. Task assignment has inherent relevancy for Cache Behavior, while Cache Behavior also affects the efficiency of task assignment. Task assignment and Cache Behavior have dramatic influences on the overall WCET of MPSoC. In this paper, overall WCET represents the worst-case finishing time of a set of tasks running on different cores. This paper proposes joint task assignment and Cache partitioning techniques to minimize the overall WCET for MPSoC systems. Cache locking is applied to each task to guarantee a precise WCET, which in return facilitates task assignment and Cache partitioning. We prove that the joint problem is NP-Hard and propose several efficient algorithms. Experimental results show that the proposed algorithms can consistently reduce the overall WCET compared to previous techniques.

15 days free trial to Access Article

Philip Machanick - One of the best experts on this subject based on the ideXlab platform.

restructuring a parallel simulation to improve Cache Behavior in a shared memory multiprocessor the value of distributed synchronization

Workshop on Parallel and Distributed Simulation, 1993

Co-Authors: David R. Cheriton, Hendrik A. Goosen, Hugh W Holbrook, Philip Machanick

Abstract:

Synchronization is a significant cost in many parallel programs, and can be a major bottleneck if it is handled in a centralized fashion using traditional shared-memory constructs such as barriers. In a parallel time-stepped simulation, the use of global synchronization primitives limits scalability, increases the sensitivity to load imbalance, and reduces the potential for exploiting locality to improve Cache Behavior. This paper presents the results of an initial one-application study quantifying the costs and performance benefits of distributed, nearest neighbors synchronization. The application studied, MP3D, is a particle-based wind tunnel simulation. Our results for this one application on current shared-memory multiprocessors show a significant decrease in synchronization time using these techniques. We prototyped an application-independent library that implements distributed synchronization. The library allows a variety of parallel simulations to exploit these techniques without increasing the application programming beyond that of conventional approaches.

15 days free trial to Access Article
Restructuring a parallel simulation to improve Cache Behavior in a shared-memory multiprocessor

ACM SIGSIM Simulation Digest, 1993

Co-Authors: David R. Cheriton, Hendrik A. Goosen, Hugh Holbrook, Philip Machanick

Abstract:

Synchronization is a significant cost in many parallel programs, and can be a major bottleneck if it is handled in a centralized fashion using traditional shared-memory constructs such as barriers. In a parallel time-stepped simulation, the use of global synchronization primitives limits scalability, increases the sensitivity to load imbalance, and reduces the potential for exploiting locality to improve Cache Behavior. This paper presents the results of an initial one-application study quantifying the costs and performance benefits of distributed, nearest neighbors synchronization. The application studied, MP3D, is a particle-based wind tunnel simulation. Our results for this one application on current shared-memory multiprocessors show a significant decrease in synchronization time using these techniques. We prototyped an application-independent library that implements distributed synchronization. The library allows a variety of parallel simulations to exploit these techniques without increasing the application programming beyond that of conventional approaches.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Cache Behavior with ideXlab!

Yan Solihin - One of the best experts on this subject based on the ideXlab platform.

west cloning data Cache Behavior using stochastic traces

HPCA - WEST: Cloning data Cache Behavior using Stochastic Traces

Reinhard Wilhelm - One of the best experts on this subject based on the ideXlab platform.

component wise instruction Cache Behavior prediction

ATVA - Component-Wise Instruction-Cache Behavior Prediction

Efficient and Precise Cache Behavior Prediction for Real-TimeSystems

Efficient and Precise Cache Behavior Prediction for Real-Time Systems

Cache Behavior prediction by abstract interpretation

Balakrishnan Ganesh - One of the best experts on this subject based on the ideXlab platform.

HPCA - WEST: Cloning data Cache Behavior using Stochastic Traces

Chun Jason Xue - One of the best experts on this subject based on the ideXlab platform.

Joint task assignment and Cache partitioning with Cache locking for WCET minimization on MPSoC

ICPP - Task Assignment with Cache Partitioning and Locking for WCET Minimization on MPSoC

Philip Machanick - One of the best experts on this subject based on the ideXlab platform.

restructuring a parallel simulation to improve Cache Behavior in a shared memory multiprocessor the value of distributed synchronization

Restructuring a parallel simulation to improve Cache Behavior in a shared-memory multiprocessor