Cache Coherence Problem

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 33 Experts worldwide ranked by ideXlab platform

David J Lilja - One of the best experts on this subject based on the ideXlab platform.

  • Cache Coherence in large scale shared memory multiprocessors issues and comparisons
    ACM Computing Surveys, 1993
    Co-Authors: David J Lilja
    Abstract:

    Private data Caches have not been as effective in reducing the average memory delay in multiprocessors as in uniprocessors due to data spreading among the processors, and due to the Cache Coherence Problem. A wide variety of mechanisms have been proposed for maintaining Cache Coherence in large-scale shared memory multiprocessors making it difficult to compare their performance and implementation implications. To help the computer architect understand some of the trade-offs involved, this paper surveys current Cache Coherence mechanisms, and identifies several issues critical to their design. These design issues include: 1) the Coherence detection strategy, through which possibly incoherent memory accesses are detected either statically at compile-time, or dynamically at run-time; 2) the Coherence enforcement strategy, such as updating or invalidating, that is used to ensure that stale Cache entries are never referenced by a processor; 3) how the precision of block sharing information can be changed to trade-off the implementation cost and the performance of the Coherence mechanism; and 4) how the Cache block size affects the performance of the memory system. Trace-driven simulations are used to compare the performance and implementation impacts of these different issues. In addition, hybrid strategies are presented that can enhance the performance of the multiprocessor memory system by combining several different Coherence mechanisms into a single system.

Chen Yung-chin - One of the best experts on this subject based on the ideXlab platform.

  • Cache Design and Performance in a Large-Scale Shared-Memory Multiprocessor System
    1
    Co-Authors: Chen Yung-chin
    Abstract:

    200 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1993.The use of a private Cache in each processor of large-scale shared-memory multiprocessor systems can reduce long global memory latency but also introduces the Cache Coherence Problem. Cache design and performance in a large-scale multiprocessor are affected by the Cache Coherence Problem and the Cache Coherence scheme implemented. The behavior of a parallel program usually differs from that of the same program executed sequentially. Consequently, the Cache behaves differently and may not perform as well as the Cache in a uniprocessor system. Some results of previous Cache studies for a uniprocessor system are less applicable to multiprocessor Caches. In this thesis, the Cache design and performance using a directory and a software Coherence scheme in multistage-interconnection-network-based multiprocessor systems are studied using trace-driven timing simulation of numerical benchmarks. Design complexity and performance trade-offs for both schemes are studied. Their performance Problems are analyzed in detail, and several improvements are proposed and evaluated and are shown to be effective in improving the performance. Next, the performance of the directory and software schemes are compared; the simple software scheme is shown to have better performance for numerical programs. The performance advantages and disadvantages of the two schemes are analyzed, and a new Coherence scheme combining the best of both schemes is proposed. This new scheme is shown to achieve higher hit ratios. Overall, the global memory remains one of the major performance bottlenecks for a multiprocessor system even though private Caches are being used. The effectiveness of memory Caches to reduce global memory access latency is demonstrated.U of I OnlyRestricted to the U of I community idenfinitely during batch ingest of legacy ETD

Choonki Jang - One of the best experts on this subject based on the ideXlab platform.

  • a software managed coherent memory architecture for manycores
    International Conference on Parallel Architectures and Compilation Techniques, 2011
    Co-Authors: Jungho Park, Choonki Jang
    Abstract:

    Cache coherent Non-Uniform Memory Access (cc-NUMA) architectures have been widely used for chip multiprocessors (CMPs). However, they require complicated hardware to properly handle the Cache Coherence Problem. Moreover, it generates heavy on-chip network traffic due to the Coherence enforcement. In this work, we propose a simple software-managed coherent memory architecture for many cores. Our memory architecture exploits explicitly addressed local stores. Instead of implementing the complicated Cache Coherence protocol in hardware, Coherence and consistency are supported by software, such as a runtime or an operating system. The local stores together with the software leverage conventional Caches to make the architecture much simpler and to generate much less network traffic than conventional ccNUMA-based CMPs. Experimental results indicate that our approach is promising.

Donald Yeung - One of the best experts on this subject based on the ideXlab platform.

  • the mit alewife machine architecture and performance
    International Symposium on Computer Architecture, 1995
    Co-Authors: Anant Agarwal, David Chaiken, Ricardo Bianchini, Kirk L. Johnson, David M. Kranz, John Kubiatowicz, Beng-hong Lim, Kenneth Mackenzie, Donald Yeung
    Abstract:

    Alewife is a multiprocessor architecture that supports up to 512 processing nodes connected over a scalable and cost-effective mesh network at a constant cost per node. The MIT Alewife machine, a prototype implementation of the architecture, demonstrates that a parallel system can be both scalable and programmable. Four mechanisms combine to achieve these goals: software-extended coherent shared memory provides a global, linear address space; integrated message passing allows compiler and operating system designers to provide efficient communication and synchronization; support for fine-grain computation allows many processors to cooperate on small Problem sizes; and latency tolerance mechanisms --- including block multithreading and prefetching --- mask unavoidable delays due to communication.Microbenchmarks, together with over a dozen complete applications running on the 32-node prototype, help to analyze the behavior of the system. Analysis shows that integrating message passing with shared memory enables a cost-efficient solution to the Cache Coherence Problem and provides a rich set of programming primitives. Block multithreading and prefetching improve performance by up to 25% individually, and 35% together. Finally, language constructs that allow programmers to express fine-grain synchronization can improve performance by over a factor of two.

Veljko Milutinovic - One of the best experts on this subject based on the ideXlab platform.

  • The Cache Coherence Problem in Shared-Memory Multiprocessors: Software Solutions
    1996
    Co-Authors: Milo Tomasevic, Veljko Milutinovic
    Abstract:

    From the Publisher: Almost all software solutions are developed through academic research and implemented only in prototype machines, thus leaving the field of software techniques for maintaining the Cache Coherence widely open for new research and development. This book is a collection of all representative approaches to software Coherence maintenance and includes a number of related studies in the performance evaluation field. The book illustrates state-of-the-art software solutions for Cache Coherence maintenance in shared-memory multiprocessors. It begins with a brief overview of the Cache Coherence Problem and introduces software solutions to the Problem. The text defines and details static and dynamic software schemes, techniques for modeling performance evaluation mechanisms, and performance evaluation studies. The book is intended for the experienced reader in computer engineering but possibly a novice in the topic of Cache Coherence. It also provides an in-depth understanding of the Problem as well as a comprehensive overview for multicomputer designers, computer architects, and compiler writers. In addition, it is a software Coherence reference handbook for advanced undergraduate and typical graduate students in multiprocessing and multiprogramming areas.