Sequential Consistency

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 9897 Experts worldwide ranked by ideXlab platform

Shaz Qadeer - One of the best experts on this subject based on the ideXlab platform.

  • automatic verification of Sequential Consistency for unbounded addresses and data values
    Lecture Notes in Computer Science, 2004
    Co-Authors: Jesse Bingham, Shaz Qadeer, Anne Condon, Zhichuan Zhang
    Abstract:

    Sequential Consistency is the archetypal correctness condition for the memory protocols of shared-memory multiprocessors. Typically, such protocols are parameterized by the number of processors, the number of addresses, and the number of distinguishable data values, and typically, automatic protocol verification analyzes only concrete instances of the protocol with small values (generally < 3) for the protocol parameters. This paper presents a fully automatic method for proving the Sequential Consistency of an entire parameterized family of protocols, with the number of processors fixed, but the number of addresses and data values being unbounded parameters. Using some practical, reasonable assumptions (data independence, processor symmetry, location symmetry, simple store ordering, some syntactic restrictions), the method automatically generates a finite-state protocol from the parameterized protocol description; proving Sequential Consistency of the abstract model, via known methods, guarantees Sequential Consistency of the entire protocol family. The method is sound, but incomplete, but we argue that it is likely to apply to most real protocols. We present experimental results showing the effectiveness of our method on parameterized versions of the Piranha shared memory protocol and an extended version of a directory protocol from the University of Wisconsin Multifacet Project.

  • verifying Sequential Consistency on shared memory multiprocessors by model checking
    IEEE Transactions on Parallel and Distributed Systems, 2003
    Co-Authors: Shaz Qadeer
    Abstract:

    The memory model of a shared-memory multiprocessor is a contract between the designer and the programmer of the multiprocessor. A memory model is typically implemented by means of a cache-coherence protocol. The design of this protocol is one of the most complex aspects of multiprocessor design and is consequently quite error-prone. However, it is imperative to ensure that the cache-coherence protocol satisfies the shared-memory model. We present a novel technique based on model checking to tackle this difficult problem for the important and well-known shared-memory model of Sequential Consistency. Surprisingly, verifying Sequential Consistency is undecidable in general, even for finite-state cache-coherence protocols. In practice, cache-coherence protocols satisfy the properties of causality and data independence. Causality is the property that values of read events flow from values of write events. Data independence is the property that all traces can be generated by renaming data values from traces where the written values are pairwise distinct. We show that, if a causal and data independent system also has the property that the logical order of write events to each location is identical to their temporal order, then Sequential Consistency is decidable. We present a novel model checking algorithm to verify Sequential Consistency on such systems for a finite number of processors and memory locations and an arbitrary number of data values.

  • verifying Sequential Consistency on shared memory multiprocessors by model checking
    arXiv: Distributed Parallel and Cluster Computing, 2001
    Co-Authors: Shaz Qadeer
    Abstract:

    The memory model of a shared-memory multiprocessor is a contract between the designer and programmer of the multiprocessor. The Sequential Consistency memory model specifies a total order among the memory (read and write) events performed at each processor. A trace of a memory system satisfies Sequential Consistency if there exists a total order of all memory events in the trace that is both consistent with the total order at each processor and has the property that every read event to a location returns the value of the last write to that location. Descriptions of shared-memory systems are typically parameterized by the number of processors, the number of memory locations, and the number of data values. It has been shown that even for finite parameter values, verifying Sequential Consistency on general shared-memory systems is undecidable. We observe that, in practice, shared-memory systems satisfy the properties of causality and data independence. Causality is the property that values of read events flow from values of write events. Data independence is the property that all traces can be generated by renaming data values from traces where the written values are distinct from each other. If a causal and data independent system also has the property that the logical order of write events to each location is identical to their temporal order, then Sequential Consistency can be verified algorithmically. Specifically, we present a model checking algorithm to verify Sequential Consistency on such systems for a finite number of processors and memory locations and an arbitrary number of data values.

  • verifying Sequential Consistency on shared memory multiprocessor systems
    Computer Aided Verification, 1999
    Co-Authors: Thomas A Henzinger, Shaz Qadeer, Sriram K Rajamani
    Abstract:

    In shared-memory multiprocessors Sequential Consistency offers a natural tradeoff between the flexibility afforded to the implementor and the complexity of the programmer's view of the memory. Sequential Consistency requires that some interleaving of the local temporal orders of read/write events at different processors be a trace of serial memory. We develop a systematic methodology for proving Sequential Consistency for memory systems with three parameters --number of processors, number of memory locations, and number of data values. From the definition of Sequential Consistency it suffices to construct a non-interfering observer that watches and reorders read/write events so that a trace of serial memory is obtained. While in general such an observer must be unbounded even for fixed values of the parameters --checking Sequential Consistency is undecidable!-- we show that for two paradigmatic protocol classes--lazy caching and snoopy cache coherence--there exist finite-state observers. In these cases, Sequential Consistency for fixed parameter values can thus be checked by language inclusion between finite automata. In order to reduce the arbitrary-parameter problem to the fixed-parameter problem, we develop a novel framework for induction over the number of processors. Classical induction schemas, which are based on process invariants that are inductive with respect to an implementation preorder that preserves the temporal sequence of events, are inadequate for our purposes, because proving Sequential Consistency requires the reordering of events. Hence we introduce merge invariants, which permit certain reorderings of read/write events. We show that under certain reasonable assumptions about the memory system, it is possible to conclude Sequential Consistency for any number of processors, memory locations, and data values by model checking two finite-state lemmas about process and merge invariants: they involve two processors each accessing a maximum of three locations, where each location stores at most two data values. For both lazy caching and snoopy cache coherence we are able to discharge the two lemmas using the model checker MOCHA.

Michel Raynal - One of the best experts on this subject based on the ideXlab platform.

  • value based Sequential Consistency for set objects in dynamic distributed systems
    European Conference on Parallel Processing, 2010
    Co-Authors: Roberto Baldoni, Silvia Bonomi, Michel Raynal
    Abstract:

    This paper introduces a shared object, namely a set object that allows processes to add and remove values as well as take a snapshot of its content. A new Consistency condition suited to such an object is introduced. This condition, named value-based Sequential Consistency, is weaker than linearizability. The paper also addresses the construction of a set object in a synchronous anonymous distributed system where participants can continuously join and leave the system. Interestingly, the protocol is proved correct under the assumption that some constraint on the churn is satisfied. This shows that the notion of "provably correct software" can be applied to dynamic systems.

  • a methodological construction of an efficient Sequential Consistency protocol
    Network Computing and Applications, 2004
    Co-Authors: Vicent Cholvi, Antonio Fernandez, Ernesto Jimenez, Michel Raynal
    Abstract:

    A concurrent object is an object that can be concurrently accessed by several processes. Sequential Consistency is a Consistency criterion for such objects. Informally, it states that a multiprocess program executes correctly if its results could have been produced by executing that program on a single processor system. (Sequential Consistency is weaker than atomic Consistency -the usual Consistency criterion- as it does not refer to real-time.) The paper proposes a simple protocol that ensures Sequential Consistency when the shared memory abstraction is supported by the local memories of nodes that can communicate only by exchanging messages through reliable channels. Differently from other Sequential Consistency protocols, the proposed protocol does not rely on a strong synchronization mechanism such as an atomic broadcast primitive or a central node managing a copy of every shared object. From a methodological point of view, the protocol is built incrementally starting from the very definition of Sequential Consistency. It lies the noteworthy property of providing fast writes operations (i.e., a process has never to wait when it writes a new value in a shared object). According to the current local state, some read operations can also be fast. An experimental evaluation of the protocol is also presented. The proposed protocol could be used to manage Web page caching.

  • Sequential Consistency as lazy linearizability
    Conference Information and Communication Technology, 2002
    Co-Authors: Michel Raynal
    Abstract:

    This paper shows that actually Sequential Consistency is a form of "lazy" atomic Consistency. More precisely, it proposes a new particularly simple Sequential Consistency protocol that orders the conflicting operations on each object separately, and appropriately invalidates object copies to prevent Consistency violation. When compared to invalidation-based protocols that ensure atomic Consistency (such as Li-Hudak's protocol), the proposed protocol can be seen as using lazy invalidation. Hence, in addition to a new Consistency protocol, the paper provides a new insight into the concepts and mechanisms that underlie Consistency protocols: while atomic Consistency is based on physical time and requires eager invalidation, Sequential Consistency is based on logical time and needs only lazy invalidation.

  • Sequential Consistency as lazy linearizability
    ACM Symposium on Parallel Algorithms and Architectures, 2002
    Co-Authors: Michel Raynal
    Abstract:

    This revue shows that, from an implementation point of view, Sequential Consistency can be considered as a form of lazy linearizability. This claim is supported by a versatile protocol that can be tailored to implement any of them.

  • from causal Consistency to Sequential Consistency in shared memory systems
    Foundations of Software Technology and Theoretical Computer Science, 1995
    Co-Authors: Michel Raynal, Andre Schiper
    Abstract:

    Sequential Consistency and causal Consistency constitute two of the main Consistency criteria used to define the semantics of accesses in the shared memory model. An execution is Sequentially consistent if all processes can agree on a same legal Sequential history of all the accesses; if processes perceive distinct legal Sequential histories of all the accesses, the execution is only causally consistent (legality means that a read does not get an overwritten value).

Mark D Hill - One of the best experts on this subject based on the ideXlab platform.

  • Sequential Consistency for heterogeneous race free programmer centric memory models for heterogeneous platforms
    2013
    Co-Authors: Derek R Hower, Mark D Hill, Bradford M Beckmann, Benedict R Gaster, Blake A Hechtman, Steven K Reinhardt, David A Wood
    Abstract:

    Hardware vendors now provide heterogeneous platforms in commodity markets (e.g., integrated CPUs and GPUs), and are promising an integrated, shared memory address space for such platforms in future iterations. Because not all threads in a heterogeneous platform can communicate with the same latency, vendors are proposing synchronization mechanisms that allow threads to communicate with a subset of threads (called a scope). However, vendors have yet to define a comprehensive and portable memory model that programmers can use to reason about scopes. Moreover, existing CPU memory models, such as Sequential Consistency for Data-Race-Free (SC for DRF), are illsuited, in part, because they define all synchronization operations globally and preclude low-energy, highperformance local coordination. Towards this end, we embrace scoped synchronization with a new class of memory Consistency models: Sequential Consistency for Heterogeneous-Race-Free (SC for HRF). Inspired by SC for DRF (C++, Java), the new models provide programmers with SC for programs with "sufficient" synchronization (no data races) of "sufficient" scope. We develop the first such model, called HRF0, show how it can be used to develop high-performance code, show example hardware support, and motivate future work.

  • lamport clocks verifying a directory cache coherence protocol
    ACM Symposium on Parallel Algorithms and Architectures, 1998
    Co-Authors: Manoj Plakal, Daniel J Sorin, Anne Condon, Mark D Hill
    Abstract:

    Modern shared-memory multiprocessors use complex memory system implementations that include a variety of non-trivial and interacting optimizations. More time is spent in verifying the correctness of such implementations than in designing the system. In particular, large-scale Distributed Shared Memory (DSM) systems usually rely on a directory cache-coherence protocol to provide the illusion of a Sequentially consistent shar ed address space. Verifying that such a distributed protocol satisfies Sequential Consistency is a difficult task. Current formal protocol verification techniques [18] complement simulation, but ar e some what nonintuitive to system designers and verifiers, and they do not scale well to practical systems. In this paper, we examine a new reasoning technique that is precise and (we find) intuitive. Our technique is based on Lamport’ s logical clocks, which were originally used in distributed systems. We make modest extensions to Lamport’ s logical clocking scheme to assign timestamps to r elevant protocol events to construct a total ordering of such events. Such total orderings can be used to verify that the requirements of a particular memory Consistency model have been satisfied. We apply Lamport clocks to prove that a non-trivial directory protocol implements Sequential Consistency. T o do this, we describe an SGI Origin 2000-like protocol [12] in detail, provide a timestamping scheme that totally orders all protocol events, and then prove Sequential Consistency (i.e., a load always returns the value of the “last” stor e to the same address in timestamp order).

  • a unified formalization of four shared memory models
    IEEE Transactions on Parallel and Distributed Systems, 1993
    Co-Authors: Sarita V Adve, Mark D Hill
    Abstract:

    The authors present a data-race-free-1, shared-memory model that unifies four earlier models: weak ordering, release Consistency (with Sequentially consistent special operations), the VAX memory model, and data-race-free-0. Data-race-free-1 unifies the models of weak ordering, release Consistency, the VAX, and data-race-free-0 by formalizing the intuition that if programs synchronize explicitly and correctly, then Sequential Consistency can be guaranteed with high performance in a manner that retains the advantages of each of the four models. Data-race-free-1 expresses the programmer's interface more explicitly and formally than weak ordering and the VAX, and allows an implementation not allowed by weak ordering, release Consistency, or data-race-free-0. The implementation proposal for data-race-free-1 differs from earlier implementations by permitting the execution of all synchronization operations of a processor even while previous data operations of the processor are in progress. To ensure Sequential Consistency, two sychronizing processors exchange information to delay later operations of the second processor that conflict with an incomplete data operation of the first processor. >

Madanlal Musuvathi - One of the best experts on this subject based on the ideXlab platform.

  • accelerating Sequential Consistency for java with speculative compilation
    Programming Language Design and Implementation, 2019
    Co-Authors: Lun Liu, Todd Millstein, Madanlal Musuvathi
    Abstract:

    A memory Consistency model (or simply a memory model) specifies the granularity and the order in which memory accesses by one thread become visible to other threads in the program. We previously proposed the volatile-by-default (VBD) memory model as a natural form of Sequential Consistency (SC) for Java. VBD is significantly stronger than the Java memory model (JMM) and incurs relatively modest overheads in a modified HotSpot JVM running on Intel x86 hardware. However, the x86 memory model is already quite close to SC. It is expected that the cost of VBD will be much higher on the other widely used hardware platform today, namely ARM, whose memory model is very weak. In this paper, we quantify this expectation by building and evaluating a baseline volatile-by-default JVM for ARM called VBDA-HotSpot, using the same technique previously used for x86. Through this baseline we report, to the best of our knowledge, the first comprehensive study of the cost of providing language-level SC for a production compiler on ARM. VBDA-HotSpot indeed incurs a considerable performance penalty on ARM, with average overheads on the DaCapo benchmarks on two ARM servers of 57% and 73% respectively. Motivated by these experimental results, we then present a novel speculative technique to optimize language-level SC. While several prior works have shown how to optimize SC in the context of an offline, whole-program compiler, to our knowledge this is the first optimization approach that is compatible with modern implementation technology, including dynamic class loading and just-in-time (JIT) compilation. The basic idea is to modify the JIT compiler to treat each object as thread-local initially, so accesses to its fields can be compiled without fences. If an object is ever accessed by a second thread, any speculatively compiled code for the object is removed, and future JITed code for the object will include the necessary fences in order to ensure SC. We demonstrate that this technique is effective, reducing the overhead of enforcing VBD by one-third on average, and additional experiments validate the thread-locality hypothesis that underlies the approach.

  • bundled vm artifact to accompany paper entitled sc haskell Sequential Consistency in languages that minimize mutable shared heap
    2017
    Co-Authors: Michael Vollmer, Madanlal Musuvathi, Ryan G Scott, Ryan R Newton
    Abstract:

    A single ".ova" file encompassing an Ubuntu-based (v. 16.04) virtual machine (VM). This VM is the "artifact" used in the artifact evaluation of the corresponding PPoPP17 paper. It contains the complete software, such as example programs and benchmark programs, necessary to reproduce the results of the paper. Scripts are included in the virtual machine to automatically run benchmarks and compare their results. Download the VM by clicking the link below under "Link(s) to data and video for this item."

  • sc haskell Sequential Consistency in languages that minimize mutable shared heap
    ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017
    Co-Authors: Michael Vollmer, Madanlal Musuvathi, Ryan G Scott, Ryan R Newton
    Abstract:

    A core, but often neglected, aspect of a programming language design is its memory (Consistency) model. Sequential Consistency~(SC) is the most intuitive memory model for programmers as it guarantees Sequential composition of instructions and provides a simple abstraction of shared memory as a single global store with atomic read and writes. Unfortunately, SC is widely considered to be impractical due to its associated performance overheads. Perhaps contrary to popular opinion, this paper demonstrates that SC is achievable with acceptable performance overheads for mainstream languages that minimize mutable shared heap. In particular, we modify the Glasgow Haskell Compiler to insert fences on all writes to shared mutable memory accessed in nonfunctional parts of the program. For a benchmark suite containing 1,279 programs, SC adds a geomean overhead of less than 0.4\% on an x86 machine. The efficiency of SC arises primarily due to the isolation provided by the Haskell type system between purely functional and thread-local imperative computations on the one hand, and imperative computations on the global heap on the other. We show how to use new programming idioms to further reduce the SC overhead; these create a virtuous cycle of less overhead and even stronger semantic guarantees (static data-race freedom).

  • End-to-end Sequential Consistency
    2012 39th Annual International Symposium on Computer Architecture (ISCA), 2012
    Co-Authors: Abhayendra Singh, Satish Narayanasamy, Daniel Marino, Todd Millstein, Madanlal Musuvathi
    Abstract:

    Sequential Consistency (SC) is arguably the most intuitive behavior for a shared-memory multithreaded program. It is widely accepted that language-level SC could significantly improve programmability of a multiprocessor system. However, efficiently supporting end-to-end SC remains a challenge as it requires that both compiler and hardware optimizations preserve SC semantics. While a recent study has shown that a compiler can preserve SC semantics for a small performance cost, an efficient and complexity-effective SC hardware remains elusive. Past hardware solutions relied on aggressive speculation techniques, which has not yet been realized in a practical implementation. This paper exploits the observation that hardware need not enforce any memory model constraints on accesses to thread-local and shared read-only locations. A processor can easily determine a large fraction of these safe accesses with assistance from static compiler analysis and the hardware memory management unit. We discuss a low-complexity hardware design that exploits this information to reduce the overhead in ensuring SC. Our design employs an additional unordered store buffer for fast-tracking thread-local stores and allowing later memory accesses to proceed without a memory ordering related stall. Our experimental study shows that the cost of guaranteeing end-to-end SC is only 6.2% on average when compared to a system with TSO hardware executing a stock compiler's output.

Werner Krauth - One of the best experts on this subject based on the ideXlab platform.

  • Multithreaded event-chain Monte Carlo with local times
    Computer Physics Communications, 2020
    Co-Authors: Synge Todo, A. Maggs, Werner Krauth
    Abstract:

    We present a multithreaded event-chain Monte Carlo algorithm (ECMC) for hard spheres. Threads synchronize at infrequent breakpoints and otherwise scan for local horizon violations. Using a mapping onto absorbing Markov chains, we rigorously prove the correctness of a Sequential-Consistency implementation for small test suites. On x86 and ARM processors, a C++ (OpenMP) implementation that uses compare-and-swap primitives for data access achieves considerable speed-up with respect to single-threaded code. The generalized birthday problem suggests that for the number of threads scaling as the square root of the number of spheres, the horizon-violation probability remains small for a fixed simulation time. We provide C++ and Python open-source code that reproduces all our results.