Persistent Memory

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 23541 Experts worldwide ranked by ideXlab platform

Michael M. Swift - One of the best experts on this subject based on the ideXlab platform.

  • mod minimally ordered durable datastructures for Persistent Memory
    Architectural Support for Programming Languages and Operating Systems, 2020
    Co-Authors: Swapnil Haria, Mark D Hill, Michael M. Swift
    Abstract:

    Persistent Memory (PM) makes possible recoverable applications that can preserve application progress across system reboots and power failures. Actual recoverability requires careful ordering of cacheline flushes, currently done in two extreme ways. On one hand, expert programmers have reasoned deeply about consistency and durability to create applications centered on a single custom-crafted durable datastructure. On the other hand, less-expert programmers have used software transaction Memory (STM) to make atomic one or more updates, albeit at a significant performance cost due largely to ordered log updates. In this work, we propose the middle ground of composable Persistent datastructures called Minimally Ordered Durable datastructures (MOD). We prototype MOD as a library of C++ datastructures---currently, map, set, stack, queue and vector---that often perform better than STM and yet are relatively easy to use. They allow multiple updates to one or more datastructures to be atomic with respect to failure. Moreover, we provide a recipe to create additional recoverable datastructures. MOD is motivated by our analysis of real Intel Optane PM hardware showing that allowing unordered, overlapping flushes significantly improves performance. MOD reduces ordering by adapting existing techniques for out-of-place updates (like shadow paging) with space-reducing structural sharing (from functional programming). MOD exposes a Basic interface for single updates and a Composition interface for atomically performing multiple updates. Relative to widely used Intel PMDK v1.5 STM, MOD improves map, set, stack, queue microbenchmark performance by 40%, and speeds up application benchmark performance by 38%.

  • mod minimally ordered durable datastructures for Persistent Memory
    arXiv: Distributed Parallel and Cluster Computing, 2019
    Co-Authors: Swapnil Haria, Mark D Hill, Michael M. Swift
    Abstract:

    Persistent Memory (PM) makes possible recoverable applications that can preserve application progress across system reboots and power failures. Actual recoverability requires careful ordering of cacheline flushes, currently done in two extreme ways. On one hand, expert programmers have reasoned deeply about consistency and durability to create applications centered on a single custom-crafted durable datastructure. On the other hand, less-expert programmers have used software transaction Memory (STM) to make atomic one or more updates, albeit at a significant performance cost due largely to ordered log updates. In this work, we propose the middle ground of composable Persistent datastructures called Minimally Ordered Durable (MOD) datastructures. MOD is a C++ library of several datastructures---currently, map, set, stack, queue and vector--- that often perform better than STM and yet are relatively easy to use. They allow multiple updates to one or more datastructures to be atomic with respect to failure. Moreover, we provide a recipe to create more recoverable datastructures. MOD is motivated by our analysis of real Intel Optane PM hardware showing that allowing unordered, overlapping flushes significantly improves performance. MOD reduces ordering by adapting existing techniques for out-of-place updates (like shadow paging) with space-reducing structural sharing (from functional programming). MOD exposes a Basic interface for single updates and a Composition interface for atomically performing multiple updates. Relative to the state-of-the-art Intel PMDK v1.5 STM, MOD improves map, set, stack, queue microbenchmark performance by 40%, and speeds up application benchmark performance by 38%.

  • an analysis of Persistent Memory use with whisper
    Architectural Support for Programming Languages and Operating Systems, 2017
    Co-Authors: Sanketh Nalli, Haris Volos, Michael M. Swift, Swapnil Haria, Mark D Hill, Kimberly Keeton
    Abstract:

    Emerging non-volatile Memory (NVM) technologies promise durability with read and write latencies comparable to volatile Memory (DRAM). We define Persistent Memory (PM) as NVM accessed with byte addressability at low latency via normal Memory instructions. Persistent-Memory applications ensure the consistency of Persistent data by inserting ordering points between writes to PM allowing the construction of higher-level transaction mechanisms. An epoch is a set of writes to PM between ordering points. To put systems research in PM on a firmer footing, we developed and analyzed a PM benchmark suite called WHISPER (Wisconsin-HP Labs Suite for Persistence) that comprises ten PM applications we gathered to cover all current interfaces to PM. A quantitative analysis reveals several insights: (a) only 4% of writes in PM-aware applications are to PM and the rest are to volatile Memory, (b) software transactions are often implemented with 5 to 50 ordering points (c) 75% of epochs update exactly one 64B cache line, (d) 80% of epochs from the same thread depend on previous epochs from the same thread, while few epochs depend on epochs from other threads. Based on our analysis, we propose the Hands-off Persistence System (HOPS) to track updates to PM in hardware. Current hardware design requires applications to force data to PM as each epoch ends. HOPS provides high-level ISA primitives for applications to express durability and ordering constraints separately and enforces them automatically, while achieving 24.3% better performance over current approaches to persistence.

  • Mnemosyne: Lightweight Persistent Memory A Memory is what is left when something happens and does not completely unhappen.
    2012
    Co-Authors: Edward De Bono, Haris Volos, Andres Jaan Tack, Michael M. Swift
    Abstract:

    New storage-class Memory (SCM) technologies, such as phasechange Memory, STT-RAM, and memristors, promise user-level access to non-volatile storage through regular Memory instructions. These Memory devices enable fast user-mode access to persistence, allowing regular in-Memory data structures to survive system crashes. In this paper, we present Mnemosyne, a simple interface for programming with Persistent Memory. Mnemosyne addresses two challenges: how to create and manage such Memory, and how to ensure consistency in the presence of failures. Without additional mechanisms, a system failure may leave data structures in SCM in an invalid state, crashing the program the next time it starts. In Mnemosyne, programmers declare global Persistent data with the keyword “pstatic ” or allocate it dynamically. Mnemosyne provides primitives for directly modifying Persistent variables and supports consistent updates through a lightweight transaction mechanism. Compared to past work on disk-based Persistent Memory, Mnemosyne reduces latency to storage by writing data directly to Memory at the granularity of an update rather than writing Memory pages back to disk through the file system. In tests emulating the performance characteristics of forthcoming SCMs, we show that Mnemosyne can persist data as fast as 3 microseconds. Furthermore, it provides a 35 percent performance increase when applied in the OpenLDAP directory server. In microbenchmark studies we find that Mnemosyne can be up to 1400 % faster than alternative persistence strategies, such as Berkeley DB or Boost serialization, that are designed for disks

  • revamping the system interface to storage class Memory
    2012
    Co-Authors: Michael M. Swift, Haris Volos
    Abstract:

    Emerging storage-class Memory (SCM) devices, such as phase-change Memory and memristors, provide the interface of Memory but the persistence of disks. SCM promises low-latency storage, which can benefit modern applications ranging from desktop applications that frequently flush data to stable storage, to large-scale web applications that perform lots of dependent lookups such as social-network graphs. Existing operating-system interfaces, however, fail to expose the full capabilities of SCM. Under the current storage model, the OS mediates every access to storage for protection and to abstract details of the specific storage device through a driver. This causes unneeded complexity and lower performance for SCM, which can be accessed directly through the Memory interface rather than peripherally via I/O requests. This dissertation revisits the system interface to storage in light of SCM. The central hypothesis is that direct access to SCM from user mode can form the basis for a flexible high-performance storage architecture. Direct access promises much higher performance by avoiding the cost of entering the kernel and removing layers of code. Direct access also enables flexibility by letting applications customize the storage system to their needs so as to avoid paying generic overheads and therefore further improve performance. The dissertation presents the design, implementation, and evaluation of an operating-system storage architecture based on direct access to SCM. The architecture comprises two parts. The first part, Mnemosyne, exposes Persistent Memory to user-mode programs. Persistent Memory enables programs directly store and access common in-Memory data structures, such as trees, logs, and hash tables, in regions of SCM (private to each program). While Persistent Memory enables low-latency flexible storage, it sacrifices the common file-system interface that enables application interoperability by organizing data under a global logical namespace for easy access and protecting data for secure sharing. The second part, Aerie, fills this gap. Aerie exposes a flexible file-system interface to SCM using user-mode libraries that access file data directly through Memory. Aerie retains both the benefits of direct access and the sharing and protection features of file systems.

Eleazar Leal - One of the best experts on this subject based on the ideXlab platform.

  • a Persistent Memory aware buffer pool manager simulator for multi tenant cloud databases
    International Conference on Data Engineering, 2020
    Co-Authors: Taras Basiuk, Le Gruenwald, Laurent Dorazio, Eleazar Leal
    Abstract:

    Non-Volatile Memory (NVM) is a promising development for Database Management Systems (DBMS), offering abundant and fast storage to complement traditional disk and main Memory architectures. NVM introduces additional data migration possibilities to the traditional buffer pool (BP) managers used by the DBMS. Hence, the efficient use of this new technology requires a re-design of the BP manager. For the cloud Database-as-a-Service products, this need for a re-design is further complicated by the traditional cloud providers’ goal to minimize the Service Level Agreement (SLA) violation penalties paid to their tenants. Unfortunately, current research in the area does not provide a comprehensive picture of the components constituting a multi-tenant Persistent Memory aware BP manager for a cloud database that makes use of NVM. Furthermore, researchers lack the software tools needed to quickly prototype and estimate the effectiveness of novel data management policies guiding those components. In this paper, we attempt to remedy both issues, first, by proposing a generalized framework that defines the purpose and the abstract interfaces of various multitenant Persistent Memory-aware BP manager components, and second, by developing and demonstrating a simulator algorithm that is shown to aid in quick testing of different implementations of those BP manager components.

Haris Volos - One of the best experts on this subject based on the ideXlab platform.

  • an analysis of Persistent Memory use with whisper
    Architectural Support for Programming Languages and Operating Systems, 2017
    Co-Authors: Sanketh Nalli, Haris Volos, Michael M. Swift, Swapnil Haria, Mark D Hill, Kimberly Keeton
    Abstract:

    Emerging non-volatile Memory (NVM) technologies promise durability with read and write latencies comparable to volatile Memory (DRAM). We define Persistent Memory (PM) as NVM accessed with byte addressability at low latency via normal Memory instructions. Persistent-Memory applications ensure the consistency of Persistent data by inserting ordering points between writes to PM allowing the construction of higher-level transaction mechanisms. An epoch is a set of writes to PM between ordering points. To put systems research in PM on a firmer footing, we developed and analyzed a PM benchmark suite called WHISPER (Wisconsin-HP Labs Suite for Persistence) that comprises ten PM applications we gathered to cover all current interfaces to PM. A quantitative analysis reveals several insights: (a) only 4% of writes in PM-aware applications are to PM and the rest are to volatile Memory, (b) software transactions are often implemented with 5 to 50 ordering points (c) 75% of epochs update exactly one 64B cache line, (d) 80% of epochs from the same thread depend on previous epochs from the same thread, while few epochs depend on epochs from other threads. Based on our analysis, we propose the Hands-off Persistence System (HOPS) to track updates to PM in hardware. Current hardware design requires applications to force data to PM as each epoch ends. HOPS provides high-level ISA primitives for applications to express durability and ordering constraints separately and enforces them automatically, while achieving 24.3% better performance over current approaches to persistence.

  • quartz a lightweight performance emulator for Persistent Memory software
    Proceedings of the 16th Annual Middleware Conference on, 2015
    Co-Authors: Haris Volos, Guilherme Magalhaes, Ludmila Cherkasova
    Abstract:

    Next-generation non-volatile Memory (NVM) technologies, such as phase-change Memory and memristors, can enable computer systems infrastructure to continue keeping up with the voracious appetite of data-centric applications for large, cheap, and fast storage. Persistent Memory has emerged as a promising approach to accessing emerging byte-addressable non-volatile Memory through processor load/store instructions. Due to lack of commercially available NVM, system software researchers have mainly relied on emulation to model Persistent Memory performance. However, existing emulation approaches are either too simplistic, or too slow to emulate large-scale workloads, or require special hardware. To fill this gap and encourage wider adoption of Persistent Memory, we developed a performance emulator for Persistent Memory, called Quartz. Quartz enables an efficient emulation of a wide range of NVM latencies and bandwidth characteristics for performance evaluation of emerging byte-addressable NVMs and their impact on applications performance (without modifying or instrumenting their source code) by leveraging features available in commodity hardware. Our emulator is implemented on three latest Intel Xeon-based processor architectures: Sandy Bridge, Ivy Bridge, and Haswell. To assist researchers and engineers in evaluating design decisions with emerging NVMs, we extend Quartz for emulating the application execution on future systems with two types of Memory: fast, regular volatile DRAM and slower Persistent Memory. We evaluate the effectiveness of our approach by using a set of specially designed Memory-intensive benchmarks and real applications. The accuracy of the proposed approach is validated by running these programs both on our emulation platform and a multisocket (NUMA) machine that can support a range of Memory latencies. We show that Quartz can emulate a range of performance characteristics with low overhead and good accuracy (with emulation errors 0.2% - 9%).

  • Mnemosyne: Lightweight Persistent Memory A Memory is what is left when something happens and does not completely unhappen.
    2012
    Co-Authors: Edward De Bono, Haris Volos, Andres Jaan Tack, Michael M. Swift
    Abstract:

    New storage-class Memory (SCM) technologies, such as phasechange Memory, STT-RAM, and memristors, promise user-level access to non-volatile storage through regular Memory instructions. These Memory devices enable fast user-mode access to persistence, allowing regular in-Memory data structures to survive system crashes. In this paper, we present Mnemosyne, a simple interface for programming with Persistent Memory. Mnemosyne addresses two challenges: how to create and manage such Memory, and how to ensure consistency in the presence of failures. Without additional mechanisms, a system failure may leave data structures in SCM in an invalid state, crashing the program the next time it starts. In Mnemosyne, programmers declare global Persistent data with the keyword “pstatic ” or allocate it dynamically. Mnemosyne provides primitives for directly modifying Persistent variables and supports consistent updates through a lightweight transaction mechanism. Compared to past work on disk-based Persistent Memory, Mnemosyne reduces latency to storage by writing data directly to Memory at the granularity of an update rather than writing Memory pages back to disk through the file system. In tests emulating the performance characteristics of forthcoming SCMs, we show that Mnemosyne can persist data as fast as 3 microseconds. Furthermore, it provides a 35 percent performance increase when applied in the OpenLDAP directory server. In microbenchmark studies we find that Mnemosyne can be up to 1400 % faster than alternative persistence strategies, such as Berkeley DB or Boost serialization, that are designed for disks

  • revamping the system interface to storage class Memory
    2012
    Co-Authors: Michael M. Swift, Haris Volos
    Abstract:

    Emerging storage-class Memory (SCM) devices, such as phase-change Memory and memristors, provide the interface of Memory but the persistence of disks. SCM promises low-latency storage, which can benefit modern applications ranging from desktop applications that frequently flush data to stable storage, to large-scale web applications that perform lots of dependent lookups such as social-network graphs. Existing operating-system interfaces, however, fail to expose the full capabilities of SCM. Under the current storage model, the OS mediates every access to storage for protection and to abstract details of the specific storage device through a driver. This causes unneeded complexity and lower performance for SCM, which can be accessed directly through the Memory interface rather than peripherally via I/O requests. This dissertation revisits the system interface to storage in light of SCM. The central hypothesis is that direct access to SCM from user mode can form the basis for a flexible high-performance storage architecture. Direct access promises much higher performance by avoiding the cost of entering the kernel and removing layers of code. Direct access also enables flexibility by letting applications customize the storage system to their needs so as to avoid paying generic overheads and therefore further improve performance. The dissertation presents the design, implementation, and evaluation of an operating-system storage architecture based on direct access to SCM. The architecture comprises two parts. The first part, Mnemosyne, exposes Persistent Memory to user-mode programs. Persistent Memory enables programs directly store and access common in-Memory data structures, such as trees, logs, and hash tables, in regions of SCM (private to each program). While Persistent Memory enables low-latency flexible storage, it sacrifices the common file-system interface that enables application interoperability by organizing data under a global logical namespace for easy access and protecting data for secure sharing. The second part, Aerie, fills this gap. Aerie exposes a flexible file-system interface to SCM using user-mode libraries that access file data directly through Memory. Aerie retains both the benefits of direct access and the sharing and protection features of file systems.

  • mnemosyne lightweight Persistent Memory
    Architectural Support for Programming Languages and Operating Systems, 2011
    Co-Authors: Haris Volos, Andres Jaan Tack, Michael M. Swift
    Abstract:

    New storage-class Memory (SCM) technologies, such as phase-change Memory, STT-RAM, and memristors, promise user-level access to non-volatile storage through regular Memory instructions. These Memory devices enable fast user-mode access to persistence, allowing regular in-Memory data structures to survive system crashes.In this paper, we present Mnemosyne, a simple interface for programming with Persistent Memory. Mnemosyne addresses two challenges: how to create and manage such Memory, and how to ensure consistency in the presence of failures. Without additional mechanisms, a system failure may leave data structures in SCM in an invalid state, crashing the program the next time it starts.In Mnemosyne, programmers declare global Persistent data with the keyword "pstatic" or allocate it dynamically. Mnemosyne provides primitives for directly modifying Persistent variables and supports consistent updates through a lightweight transaction mechanism. Compared to past work on disk-based Persistent Memory, Mnemosyne reduces latency to storage by writing data directly to Memory at the granularity of an update rather than writing Memory pages back to disk through the file system. In tests emulating the performance characteristics of forthcoming SCMs, we show that Mnemosyne can persist data as fast as 3 microseconds. Furthermore, it provides a 35 percent performance increase when applied in the OpenLDAP directory server. In microbenchmark studies we find that Mnemosyne can be up to 1400% faster than alternative persistence strategies, such as Berkeley DB or Boost serialization, that are designed for disks.

Guy E Blelloch - One of the best experts on this subject based on the ideXlab platform.

  • delay free concurrency on faulty Persistent Memory
    ACM Symposium on Parallel Algorithms and Architectures, 2019
    Co-Authors: Naama Bendavid, Guy E Blelloch, Michal Friedman, Yuanhao Wei
    Abstract:

    Non-volatile Memory (NVM) promises Persistent main Memory that remains correct despite loss of power. This has sparked a line of research into algorithms that can recover from a system crash. Since caches are expected to remain volatile, concurrent data structures and algorithms must be redesigned to guarantee that they are left in a consistent state after a system crash, and that the execution can be continued upon recovery. However, the prospect of redesigning every concurrent data structure or algorithm before it can be used in NVM architectures is daunting. In this paper, we present a construction that takes any concurrent program with reads, writes and CASs to shared Memory and makes it Persistent, i.e., can be continued after one or more processes fault and have to restart. The converted algorithm has constant computational delay (preserves instruction counts on each process within a constant factor), as well as constant recovery delay (a process can recover from a fault in a constant number of instructions). We show this first for a simple transformation, and then present optimizations to make it more practical, allowing for a trade-off between computation and recovery delay. We also provide an optimized transformation for normalized lock-free data structures, thus speeding up a large class of concurrent algorithms. Finally, we experimentally evaluate these transformations by applying them to a queue. We compare the performance of our transformations to that of a Persistent transactional Memory framework, Romulus, and to a hand-tuned Persistent queue. We show that our optimized transformation performs favorably when compared to Romulus. Furthermore, our optimized transformation is even comparable to the hand-tuned version, showing that the generality we provide comes at very little performance cost.

  • the parallel Persistent Memory model
    ACM Symposium on Parallel Algorithms and Architectures, 2018
    Co-Authors: Guy E Blelloch, Phillip B Gibbons, Charles Mcguffey, Julian Shun
    Abstract:

    We consider a parallel computational model, the Parallel Persistent Memory model, comprised of P processors, each with a fast local ephemeral Memory of limited size, and sharing a large Persistent Memory. The model allows for each processor to fault at any time (with bounded probability), and possibly restart. When a processor faults, all of its state and local ephemeral Memory is lost, but the Persistent Memory remains. This model is motivated by upcoming non-volatile memories that are nearly as fast as existing random access Memory, are accessible at the granularity of cache lines, and have the capability of surviving power outages. It is further motivated by the observation that in large parallel systems, failure of processors and their caches is not unusual. We present several results for the model, using an approach that breaks a computation into capsules, each of which can be safely run multiple times. For the single-processor version we describe how to simulate any program in the RAM, the external Memory model, or the ideal-cache model with an expected constant factor overhead. For the multiprocessor version we describe how to efficiently implement a work-stealing scheduler within the model such that it handles both soft faults, with a processor restarting, and hard faults, with a processor permanently failing. For any multithreaded fork-join computation that is race free, write-after-read conflict free and has W work, D depth, and C maximum capsule work in the absence of faults, the scheduler guarantees a time bound on the model of $Oleft(\fracW P_A + \fracDP P_A leftlceillog_1/(C\f) W\right\rceil\right)$ in expectation, where P is the maximum number of processors, $P_A$ is the average number, and $\faultprob leq 1/(2C)$ is the probability a processor faults between successive Persistent Memory accesses. Within the model, and using the proposed methods, we develop efficient algorithms for parallel prefix sums, merging, sorting, and matrix multiply.

  • the parallel Persistent Memory model
    arXiv: Distributed Parallel and Cluster Computing, 2018
    Co-Authors: Guy E Blelloch, Phillip B Gibbons, Charles Mcguffey, Julian Shun
    Abstract:

    We consider a parallel computational model that consists of $P$ processors, each with a fast local ephemeral Memory of limited size, and sharing a large Persistent Memory. The model allows for each processor to fault with bounded probability, and possibly restart. On faulting all processor state and local ephemeral Memory are lost, but the Persistent Memory remains. This model is motivated by upcoming non-volatile memories that are as fast as existing random access Memory, are accessible at the granularity of cache lines, and have the capability of surviving power outages. It is further motivated by the observation that in large parallel systems, failure of processors and their caches is not unusual. Within the model we develop a framework for developing locality efficient parallel algorithms that are resilient to failures. There are several challenges, including the need to recover from failures, the desire to do this in an asynchronous setting (i.e., not blocking other processors when one fails), and the need for synchronization primitives that are robust to failures. We describe approaches to solve these challenges based on breaking computations into what we call capsules, which have certain properties, and developing a work-stealing scheduler that functions properly within the context of failures. The scheduler guarantees a time bound of $O(W/P_A + D(P/P_A) \lceil\log_{1/f} W\rceil)$ in expectation, where $W$ and $D$ are the work and depth of the computation (in the absence of failures), $P_A$ is the average number of processors available during the computation, and $f \le 1/2$ is the probability that a capsule fails. Within the model and using the proposed methods, we develop efficient algorithms for parallel sorting and other primitives.

Jishen Zhao - One of the best experts on this subject based on the ideXlab platform.

  • Persistent Memory workload characterization a hardware perspective
    IEEE International Symposium on Workload Characterization, 2019
    Co-Authors: Xiao Liu, Bhaskar Jupudi, Pankaj Mehra, Jishen Zhao
    Abstract:

    Persistent Memory is a new tier of Memory that functions as a hybrid of traditional storage systems and main Memory. It combines the advantages of both: the data persistence property of storage with the byte-addressability and fast load/store interface of Memory. As such, Persistent Memory provides direct data access without the performance and energy overhead of secondary storage access. Being at early stages of development, most previous Persistent Memory system designs are motivated and evaluated by software-based performance profiling and characterization. Yet by attaching on the processor-Memory bus, Persistent Memory is managed by both system software and hardware control units in processors and Memory devices. Therefore, understanding the hardware behavior is critical to unlocking the full potential of Persistent Memory. In this paper, we explore the performance interaction across applications, Persistent Memory system software, and hardware components, such as caching, address translation, buffering, and control logic in processors and Memory systems. Based on our characterization results, we provide a set of implications and recommendations that can be used to optimize Persistent Memory system software and hardware designs.

  • pmtest a fast and flexible testing framework for Persistent Memory programs
    Architectural Support for Programming Languages and Operating Systems, 2019
    Co-Authors: Sihang Liu, Jishen Zhao, Aasheesh Kolli, Yizhou Wei, Samira Khan
    Abstract:

    Recent non-volatile Memory technologies such as 3D XPoint and NVDIMMs have enabled Persistent Memory (PM) systems that can manipulate Persistent data directly in Memory. This advancement of Memory technology has spurred the development of a new set of crash-consistent software (CCS) for PM - applications that can recover Persistent data from Memory in a consistent state in the event of a crash (e.g., power failure). CCS developed for Persistent Memory ranges from kernel modules to user-space libraries and custom applications. However, ensuring crash consistency in CCS is difficult and error-prone. Programmers typically employ low-level hardware primitives or transactional libraries to enforce ordering and durability guarantees that are required for ensuring crash consistency. Unfortunately, hardware can reorder instructions at runtime, making it difficult for the programmers to test whether the implementation enforces the correct ordering and durability guarantees. We believe that there is an urgent need for developing a testing framework that helps programmers identify crash consistency bugs in their CCS. We find that prior testing tools lack generality, i.e., they work only for one specific CCS or Memory persistency model and/or introduce significant performance overhead. To overcome these drawbacks, we propose PMTest (available at https://pmtest.PersistentMemory.org), a crash consistency testing framework that is both flexible and fast. PMTest provides flexibility by providing two basic assertion-like software checkers to test two fundamental characteristics of all CCS: the ordering and durability guarantee. These checkers can also serve as the building blocks of other application-specific, high-level checkers. PMTest enables fast testing by deducing the persist order without exhausting all possible orders. In the evaluation with eight programs, PMTest not only identified 45 synthetic crash consistency bugs, but also detected 3 new bugs in a file system (PMFS) and in applications developed using a transactional library (PMDK), while on average being 7.1× faster than the state-of-the-art tool.

  • steal but no force efficient hardware undo redo logging for Persistent Memory systems
    High-Performance Computer Architecture, 2018
    Co-Authors: Matheus Ogleari, Ethan L Miller, Jishen Zhao
    Abstract:

    Persistent Memory is a new tier of Memory that functions as a hybrid of traditional storage systems and main Memory. It combines the benefits of both: the data persistence of storage with the fast load/store interface of Memory. Most previous Persistent Memory designs place careful control over the order of writes arriving at Persistent Memory. This can prevent caches and Memory controllers from optimizing system performance through write coalescing and reordering. We identify that such write-order control can be relaxed by employing undo+redo logging for data in Persistent Memory systems. However, traditional software logging mechanisms are expensive to adopt in Persistent Memory due to performance and energy overheads. Previously proposed hardware logging schemes are inefficient and do not fully address the issues in software. To address these challenges, we propose a hardware undo+redo logging scheme which maintains data persistence by leveraging the write-back, write-allocate policies used in commodity caches. Furthermore, we develop a cache forcewrite-back mechanism in hardware to significantly reduce the performance and energy overheads from forcing data into Persistent Memory. Our evaluation across Persistent Memory microbenchmarks and real workloads demonstrates that our design significantly improves system throughput and reduces both dynamic energy and Memory traffic. It also provides strong consistency guarantees compared to software approaches.

  • ThyNVM: Enabling software-transparent crash consistency in Persistent Memory systems
    2015 48th Annual IEEE ACM International Symposium on Microarchitecture (MICRO), 2015
    Co-Authors: Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu, Onur Mutiu
    Abstract:

    Emerging byte-addressable nonvolatile memories (NVMs) promise Persistent Memory, which allows processors to directly access Persistent data in main Memory. Yet, Persistent Memory systems need to guarantee a consistent Memory state in the event of power loss or a system crash (i.e., crash consistency). To guarantee crash consistency, most prior works rely on programmers to (1) partition Persistent and transient Memory data and (2) use specialized software interfaces when updating Persistent Memory data. As a result, taking advantage of Persistent Memory requires significant programmer effort, e.g., to implement new programs as well as modify legacy programs. Use cases and adoption of Persistent Memory can therefore be largely limited. In this paper, we propose a hardware-assisted DRAM+NVM hybrid Persistent Memory design, Transparent Hybrid NVM (ThyNVM), which supports software-transparent crash consistency of Memory data in a hybrid Memory system. To efficiently enforce crash consistency, we design a new dual-scheme checkpointing mechanism, which efficiently overlaps checkpointing time with application execution time. The key novelty is to enable checkpointing of data at multiple granularities, cache block or page granularity, in a coordinated manner. This design is based on our insight that there is a tradeoff between the application stall time due to checkpointing and the hardware storage overhead of the metadata for checkpointing, both of which are dictated by the granularity of checkpointed data. To get the best of the tradeoff, our technique adapts the checkpointing granularity to the write locality characteristics of the data and coordinates the management of multiple-granularity updates. Our evaluation across a variety of applications shows that ThyNVM performs within 4.9% of an idealized DRAM-only system that can provide crash consistency at no cost.

  • firm fair and high performance Memory control for Persistent Memory systems
    International Symposium on Microarchitecture, 2014
    Co-Authors: Jishen Zhao, Onur Mutlu, Yuan Xie
    Abstract:

    Byte-addressable nonvolatile memories promise a new technology, Persistent Memory, which incorporates desirable attributes from both traditional main Memory (byte-addressability and fast interface) and traditional storage (data persistence). To support data persistence, a Persistent Memory system requires sophisticated data duplication and ordering control for write requests. As a result, applications that manipulate Persistent Memory (Persistent applications) have very different Memory access characteristics than traditional (non-Persistent) applications, as shown in this paper. Persistent applications introduce heavy write traffic to contiguous Memory regions at a Memory channel, which cannot concurrently service read and write requests, leading to Memory bandwidth underutilization due to low bank-level parallelism, frequent write queue drains, and frequent bus turnarounds between reads and writes. These characteristics undermine the high-performance and fairness offered by conventional Memory scheduling schemes designed for non-Persistent applications. Our goal in this paper is to design a fair and high-performance Memory control scheme for a Persistent Memory based system that runs both Persistent and non-Persistent applications. Our proposal, FIRM, consists of three key ideas. First, FIRM categorizes request sources as non-intensive, streaming, random and Persistent, and forms batches of requests for each source. Second, FIRM strides Persistent Memory updates across multiple banks, thereby improving bank-level parallelism and hence Memory bandwidth utilization of Persistent Memory accesses. Third, FIRM schedules read and write request batches from different sources in a manner that minimizes bus turnarounds and write queue drains. Our detailed evaluations show that, compared to five previous Memory scheduler designs, FIRM provides significantly higher system performance and fairness.