Latency

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 318 Experts worldwide ranked by ideXlab platform

Onur Mutlu - One of the best experts on this subject based on the ideXlab platform.

  • Flexible-Latency DRAM: Understanding and Exploiting Latency Variation in Modern DRAM Chips.
    arXiv: Hardware Architecture, 2018
    Co-Authors: Kevin K. Chang, Donghyuk Lee, Abhijith Kashyap, Hasan Hassan, Saugata Ghose, Kevin Hsieh, Gennady Pekhimenko, Samira Khan, Onur Mutlu
    Abstract:

    This article summarizes key results of our work on experimental characterization and analysis of Latency variation and Latency-reliability trade-offs in modern DRAM chips, which was published in SIGMETRICS 2016, and examines the work's significance and future potential. The goal of this work is to (i) experimentally characterize and understand the Latency variation across cells within a DRAM chip for these three fundamental DRAM operations, and (ii) develop new mechanisms that exploit our understanding of the Latency variation to reliably improve performance. To this end, we comprehensively characterize 240 DRAM chips from three major vendors, and make six major new observations about Latency variation within DRAM. Notably, we find that (i) there is large Latency variation across the cells for each of the three operations; (ii) variation characteristics exhibit significant spatial locality: slower cells are clustered in certain regions of a DRAM chip; and (iii) the three fundamental operations exhibit different reliability characteristics when the Latency of each operation is reduced. Based on our observations, we propose Flexible-Latency DRAM (FLY-DRAM), a mechanism that exploits Latency variation across DRAM cells within a DRAM chip to improve system performance. The key idea of FLY-DRAM is to exploit the spatial locality of slower cells within DRAM, and access the faster DRAM regions with reduced latencies for the fundamental operations. Our evaluations show that FLY-DRAM improves the performance of a wide range of applications by 13.3%, 17.6%, and 19.5%, on average, for each of the three different vendors' real DRAM chips, in a simulated 8-core system.

  • Tiered-Latency DRAM: Enabling Low-Latency Main Memory at Low Cost
    arXiv: Hardware Architecture, 2018
    Co-Authors: Donghyuk Lee, Yoongu Kim, Jamie Liu, Lavanya Subramanian, V. Seshadri, Onur Mutlu
    Abstract:

    This paper summarizes the idea of Tiered-Latency DRAM (TL-DRAM), which was published in HPCA 2013, and examines the work's significance and future potential. The capacity and cost-per-bit of DRAM have historically scaled to satisfy the needs of increasingly large and complex computer systems. However, DRAM Latency has remained almost constant, making memory Latency the performance bottleneck in today's systems. We observe that the high access Latency is not intrinsic to DRAM, but a trade-off is made to decrease the cost per bit. To mitigate the high area overhead of DRAM sensing structures, commodity DRAMs connect many DRAM cells to each sense amplifier through a wire called a bitline. These bit-lines have a high parasitic capacitance due to their long length, and this bitline capacitance is the dominant source of DRAM Latency. Specialized low-Latency DRAMs use shorter bitlines with fewer cells, but have a higher cost-per-bit due to greater sense amplifier area overhead. To achieve both low Latency and low cost per bit, we introduce Tiered-Latency DRAM (TL-DRAM). In TL-DRAM, each long bitline is split into two shorter segments by an isolation transistor, allowing one of the two segments to be accessed with the Latency of a short-bitline DRAM without incurring a high cost per bit. We propose mechanisms that use the low-Latency segment as a hardware-managed or software-managed cache. Our evaluations show that our proposed mechanisms improve both performance and energy efficiency for both single-core and multiprogrammed workloads. Tiered-Latency DRAM has inspired several other works on reducing DRAM Latency with little to no architectural modification.

  • Tiered-Latency DRAM: A low Latency and low cost DRAM architecture
    Proceedings - International Symposium on High-Performance Computer Architecture, 2013
    Co-Authors: Donghyuk Lee, Yoongu Kim, Jamie Liu, Lavanya Subramanian, V. Seshadri, Onur Mutlu
    Abstract:

    The capacity and cost-per-bit of DRAM have historically scaled to satisfy the needs of increasingly large and complex computer systems. However, DRAM Latency has remained almost constant, making memory Latency the performance bottleneck in today's systems. We observe that the high access Latency is not intrinsic to DRAM, but a trade-off made to decrease cost-per-bit. To mitigate the high area overhead of DRAM sensing structures, commodity DRAMs connect many DRAM cells to each sense-amplifier through a wire called a bitline. These bitlines have a high parasitic capacitance due to their long length, and this bitline capacitance is the dominant source of DRAM Latency. Specialized low-Latency DRAMs use shorter bitlines with fewer cells, but have a higher cost-per-bit due to greater sense-amplifier area overhead. In this work, we introduce Tiered-Latency DRAM (TL-DRAM), which achieves both low Latency and low cost-per-bit. In TL-DRAM, each long bitline is split into two shorter segments by an isolation transistor, allowing one segment to be accessed with the Latency of a short-bitline DRAM without incurring high cost-per-bit. We propose mechanisms that use the low-Latency segment as a hardware-managed or software-managed cache. Evaluations show that our proposed mechanisms improve both performance and energy-efficiency for both single-core and multi-programmed workloads.

  • HPCA - Tiered-Latency DRAM: A low Latency and low cost DRAM architecture
    2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), 2013
    Co-Authors: Donghyuk Lee, Yoongu Kim, Jamie Liu, Lavanya Subramanian, V. Seshadri, Onur Mutlu
    Abstract:

    The capacity and cost-per-bit of DRAM have historically scaled to satisfy the needs of increasingly large and complex computer systems. However, DRAM Latency has remained almost constant, making memory Latency the performance bottleneck in today's systems. We observe that the high access Latency is not intrinsic to DRAM, but a trade-off made to decrease cost-per-bit. To mitigate the high area overhead of DRAM sensing structures, commodity DRAMs connect many DRAM cells to each sense-amplifier through a wire called a bitline. These bitlines have a high parasitic capacitance due to their long length, and this bitline capacitance is the dominant source of DRAM Latency. Specialized low-Latency DRAMs use shorter bitlines with fewer cells, but have a higher cost-per-bit due to greater sense-amplifier area overhead. In this work, we introduce Tiered-Latency DRAM (TL-DRAM), which achieves both low Latency and low cost-per-bit. In TL-DRAM, each long bitline is split into two shorter segments by an isolation transistor, allowing one segment to be accessed with the Latency of a short-bitline DRAM without incurring high cost-per-bit. We propose mechanisms that use the low-Latency segment as a hardware-managed or software-managed cache. Evaluations show that our proposed mechanisms improve both performance and energy-efficiency for both single-core and multi-programmed workloads.

Donghyuk Lee - One of the best experts on this subject based on the ideXlab platform.

  • Flexible-Latency DRAM: Understanding and Exploiting Latency Variation in Modern DRAM Chips.
    arXiv: Hardware Architecture, 2018
    Co-Authors: Kevin K. Chang, Donghyuk Lee, Abhijith Kashyap, Hasan Hassan, Saugata Ghose, Kevin Hsieh, Gennady Pekhimenko, Samira Khan, Onur Mutlu
    Abstract:

    This article summarizes key results of our work on experimental characterization and analysis of Latency variation and Latency-reliability trade-offs in modern DRAM chips, which was published in SIGMETRICS 2016, and examines the work's significance and future potential. The goal of this work is to (i) experimentally characterize and understand the Latency variation across cells within a DRAM chip for these three fundamental DRAM operations, and (ii) develop new mechanisms that exploit our understanding of the Latency variation to reliably improve performance. To this end, we comprehensively characterize 240 DRAM chips from three major vendors, and make six major new observations about Latency variation within DRAM. Notably, we find that (i) there is large Latency variation across the cells for each of the three operations; (ii) variation characteristics exhibit significant spatial locality: slower cells are clustered in certain regions of a DRAM chip; and (iii) the three fundamental operations exhibit different reliability characteristics when the Latency of each operation is reduced. Based on our observations, we propose Flexible-Latency DRAM (FLY-DRAM), a mechanism that exploits Latency variation across DRAM cells within a DRAM chip to improve system performance. The key idea of FLY-DRAM is to exploit the spatial locality of slower cells within DRAM, and access the faster DRAM regions with reduced latencies for the fundamental operations. Our evaluations show that FLY-DRAM improves the performance of a wide range of applications by 13.3%, 17.6%, and 19.5%, on average, for each of the three different vendors' real DRAM chips, in a simulated 8-core system.

  • Tiered-Latency DRAM: Enabling Low-Latency Main Memory at Low Cost
    arXiv: Hardware Architecture, 2018
    Co-Authors: Donghyuk Lee, Yoongu Kim, Jamie Liu, Lavanya Subramanian, V. Seshadri, Onur Mutlu
    Abstract:

    This paper summarizes the idea of Tiered-Latency DRAM (TL-DRAM), which was published in HPCA 2013, and examines the work's significance and future potential. The capacity and cost-per-bit of DRAM have historically scaled to satisfy the needs of increasingly large and complex computer systems. However, DRAM Latency has remained almost constant, making memory Latency the performance bottleneck in today's systems. We observe that the high access Latency is not intrinsic to DRAM, but a trade-off is made to decrease the cost per bit. To mitigate the high area overhead of DRAM sensing structures, commodity DRAMs connect many DRAM cells to each sense amplifier through a wire called a bitline. These bit-lines have a high parasitic capacitance due to their long length, and this bitline capacitance is the dominant source of DRAM Latency. Specialized low-Latency DRAMs use shorter bitlines with fewer cells, but have a higher cost-per-bit due to greater sense amplifier area overhead. To achieve both low Latency and low cost per bit, we introduce Tiered-Latency DRAM (TL-DRAM). In TL-DRAM, each long bitline is split into two shorter segments by an isolation transistor, allowing one of the two segments to be accessed with the Latency of a short-bitline DRAM without incurring a high cost per bit. We propose mechanisms that use the low-Latency segment as a hardware-managed or software-managed cache. Our evaluations show that our proposed mechanisms improve both performance and energy efficiency for both single-core and multiprogrammed workloads. Tiered-Latency DRAM has inspired several other works on reducing DRAM Latency with little to no architectural modification.

  • Tiered-Latency DRAM: A low Latency and low cost DRAM architecture
    Proceedings - International Symposium on High-Performance Computer Architecture, 2013
    Co-Authors: Donghyuk Lee, Yoongu Kim, Jamie Liu, Lavanya Subramanian, V. Seshadri, Onur Mutlu
    Abstract:

    The capacity and cost-per-bit of DRAM have historically scaled to satisfy the needs of increasingly large and complex computer systems. However, DRAM Latency has remained almost constant, making memory Latency the performance bottleneck in today's systems. We observe that the high access Latency is not intrinsic to DRAM, but a trade-off made to decrease cost-per-bit. To mitigate the high area overhead of DRAM sensing structures, commodity DRAMs connect many DRAM cells to each sense-amplifier through a wire called a bitline. These bitlines have a high parasitic capacitance due to their long length, and this bitline capacitance is the dominant source of DRAM Latency. Specialized low-Latency DRAMs use shorter bitlines with fewer cells, but have a higher cost-per-bit due to greater sense-amplifier area overhead. In this work, we introduce Tiered-Latency DRAM (TL-DRAM), which achieves both low Latency and low cost-per-bit. In TL-DRAM, each long bitline is split into two shorter segments by an isolation transistor, allowing one segment to be accessed with the Latency of a short-bitline DRAM without incurring high cost-per-bit. We propose mechanisms that use the low-Latency segment as a hardware-managed or software-managed cache. Evaluations show that our proposed mechanisms improve both performance and energy-efficiency for both single-core and multi-programmed workloads.

  • HPCA - Tiered-Latency DRAM: A low Latency and low cost DRAM architecture
    2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), 2013
    Co-Authors: Donghyuk Lee, Yoongu Kim, Jamie Liu, Lavanya Subramanian, V. Seshadri, Onur Mutlu
    Abstract:

    The capacity and cost-per-bit of DRAM have historically scaled to satisfy the needs of increasingly large and complex computer systems. However, DRAM Latency has remained almost constant, making memory Latency the performance bottleneck in today's systems. We observe that the high access Latency is not intrinsic to DRAM, but a trade-off made to decrease cost-per-bit. To mitigate the high area overhead of DRAM sensing structures, commodity DRAMs connect many DRAM cells to each sense-amplifier through a wire called a bitline. These bitlines have a high parasitic capacitance due to their long length, and this bitline capacitance is the dominant source of DRAM Latency. Specialized low-Latency DRAMs use shorter bitlines with fewer cells, but have a higher cost-per-bit due to greater sense-amplifier area overhead. In this work, we introduce Tiered-Latency DRAM (TL-DRAM), which achieves both low Latency and low cost-per-bit. In TL-DRAM, each long bitline is split into two shorter segments by an isolation transistor, allowing one segment to be accessed with the Latency of a short-bitline DRAM without incurring high cost-per-bit. We propose mechanisms that use the low-Latency segment as a hardware-managed or software-managed cache. Evaluations show that our proposed mechanisms improve both performance and energy-efficiency for both single-core and multi-programmed workloads.

Gideon Saar - One of the best experts on this subject based on the ideXlab platform.

  • low Latency trading
    Journal of Financial Markets, 2013
    Co-Authors: Joel Hasbrouck, Gideon Saar
    Abstract:

    Abstract We define low-Latency activity as strategies that respond to market events in the millisecond environment, the hallmark of proprietary trading by high-frequency traders though it could include other algorithmic activity as well. We propose a new measure of low-Latency activity to investigate the impact of high-frequency trading on the market environment. Our measure is highly correlated with NASDAQ-constructed estimates of high-frequency trading, but it can be computed from widely-available message data. We use this measure to study how low-Latency activity affects market quality both during normal market conditions and during a period of declining prices and heightened economic uncertainty. Our analysis suggests that increased low-Latency activity improves traditional market quality measures—decreasing spreads, increasing displayed depth in the limit order book, and lowering short-term volatility. Our findings suggest that given the current market structure for U.S. equities, increased low-Latency activity need not work to the detriment of long-term investors.

  • low Latency trading
    Social Science Research Network, 2013
    Co-Authors: Joel Hasbrouck, Gideon Saar
    Abstract:

    We define low-Latency activity as strategies that respond to market events in the millisecond environment, the hallmark of proprietary trading by high-frequency trading firms. We propose a new measure of low-Latency activity that can be constructed from publicly-available NASDAQ data to investigate the impact of high-frequency trading on the market environment. Our measure is highly correlated with NASDAQ-constructed estimates of high-frequency trading, but it can be computed from data that are more widely-available. We use this measure to study how low-Latency activity affects market quality both during normal market conditions and during a period of declining prices and heightened economic uncertainty. Our conclusion is that increased low-Latency activity improves traditional market quality measures — lowering short-term volatility, decreasing spreads, and increasing displayed depth in the limit order book. Of particular importance is that our findings suggest that increased low-Latency activity need not work to the detriment of long-term investors in the current market structure for U.S. equities.

Jamie Liu - One of the best experts on this subject based on the ideXlab platform.

  • Tiered-Latency DRAM: Enabling Low-Latency Main Memory at Low Cost
    arXiv: Hardware Architecture, 2018
    Co-Authors: Donghyuk Lee, Yoongu Kim, Jamie Liu, Lavanya Subramanian, V. Seshadri, Onur Mutlu
    Abstract:

    This paper summarizes the idea of Tiered-Latency DRAM (TL-DRAM), which was published in HPCA 2013, and examines the work's significance and future potential. The capacity and cost-per-bit of DRAM have historically scaled to satisfy the needs of increasingly large and complex computer systems. However, DRAM Latency has remained almost constant, making memory Latency the performance bottleneck in today's systems. We observe that the high access Latency is not intrinsic to DRAM, but a trade-off is made to decrease the cost per bit. To mitigate the high area overhead of DRAM sensing structures, commodity DRAMs connect many DRAM cells to each sense amplifier through a wire called a bitline. These bit-lines have a high parasitic capacitance due to their long length, and this bitline capacitance is the dominant source of DRAM Latency. Specialized low-Latency DRAMs use shorter bitlines with fewer cells, but have a higher cost-per-bit due to greater sense amplifier area overhead. To achieve both low Latency and low cost per bit, we introduce Tiered-Latency DRAM (TL-DRAM). In TL-DRAM, each long bitline is split into two shorter segments by an isolation transistor, allowing one of the two segments to be accessed with the Latency of a short-bitline DRAM without incurring a high cost per bit. We propose mechanisms that use the low-Latency segment as a hardware-managed or software-managed cache. Our evaluations show that our proposed mechanisms improve both performance and energy efficiency for both single-core and multiprogrammed workloads. Tiered-Latency DRAM has inspired several other works on reducing DRAM Latency with little to no architectural modification.

  • Tiered-Latency DRAM: A low Latency and low cost DRAM architecture
    Proceedings - International Symposium on High-Performance Computer Architecture, 2013
    Co-Authors: Donghyuk Lee, Yoongu Kim, Jamie Liu, Lavanya Subramanian, V. Seshadri, Onur Mutlu
    Abstract:

    The capacity and cost-per-bit of DRAM have historically scaled to satisfy the needs of increasingly large and complex computer systems. However, DRAM Latency has remained almost constant, making memory Latency the performance bottleneck in today's systems. We observe that the high access Latency is not intrinsic to DRAM, but a trade-off made to decrease cost-per-bit. To mitigate the high area overhead of DRAM sensing structures, commodity DRAMs connect many DRAM cells to each sense-amplifier through a wire called a bitline. These bitlines have a high parasitic capacitance due to their long length, and this bitline capacitance is the dominant source of DRAM Latency. Specialized low-Latency DRAMs use shorter bitlines with fewer cells, but have a higher cost-per-bit due to greater sense-amplifier area overhead. In this work, we introduce Tiered-Latency DRAM (TL-DRAM), which achieves both low Latency and low cost-per-bit. In TL-DRAM, each long bitline is split into two shorter segments by an isolation transistor, allowing one segment to be accessed with the Latency of a short-bitline DRAM without incurring high cost-per-bit. We propose mechanisms that use the low-Latency segment as a hardware-managed or software-managed cache. Evaluations show that our proposed mechanisms improve both performance and energy-efficiency for both single-core and multi-programmed workloads.

  • HPCA - Tiered-Latency DRAM: A low Latency and low cost DRAM architecture
    2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), 2013
    Co-Authors: Donghyuk Lee, Yoongu Kim, Jamie Liu, Lavanya Subramanian, V. Seshadri, Onur Mutlu
    Abstract:

    The capacity and cost-per-bit of DRAM have historically scaled to satisfy the needs of increasingly large and complex computer systems. However, DRAM Latency has remained almost constant, making memory Latency the performance bottleneck in today's systems. We observe that the high access Latency is not intrinsic to DRAM, but a trade-off made to decrease cost-per-bit. To mitigate the high area overhead of DRAM sensing structures, commodity DRAMs connect many DRAM cells to each sense-amplifier through a wire called a bitline. These bitlines have a high parasitic capacitance due to their long length, and this bitline capacitance is the dominant source of DRAM Latency. Specialized low-Latency DRAMs use shorter bitlines with fewer cells, but have a higher cost-per-bit due to greater sense-amplifier area overhead. In this work, we introduce Tiered-Latency DRAM (TL-DRAM), which achieves both low Latency and low cost-per-bit. In TL-DRAM, each long bitline is split into two shorter segments by an isolation transistor, allowing one segment to be accessed with the Latency of a short-bitline DRAM without incurring high cost-per-bit. We propose mechanisms that use the low-Latency segment as a hardware-managed or software-managed cache. Evaluations show that our proposed mechanisms improve both performance and energy-efficiency for both single-core and multi-programmed workloads.

Lavanya Subramanian - One of the best experts on this subject based on the ideXlab platform.

  • Tiered-Latency DRAM: Enabling Low-Latency Main Memory at Low Cost
    arXiv: Hardware Architecture, 2018
    Co-Authors: Donghyuk Lee, Yoongu Kim, Jamie Liu, Lavanya Subramanian, V. Seshadri, Onur Mutlu
    Abstract:

    This paper summarizes the idea of Tiered-Latency DRAM (TL-DRAM), which was published in HPCA 2013, and examines the work's significance and future potential. The capacity and cost-per-bit of DRAM have historically scaled to satisfy the needs of increasingly large and complex computer systems. However, DRAM Latency has remained almost constant, making memory Latency the performance bottleneck in today's systems. We observe that the high access Latency is not intrinsic to DRAM, but a trade-off is made to decrease the cost per bit. To mitigate the high area overhead of DRAM sensing structures, commodity DRAMs connect many DRAM cells to each sense amplifier through a wire called a bitline. These bit-lines have a high parasitic capacitance due to their long length, and this bitline capacitance is the dominant source of DRAM Latency. Specialized low-Latency DRAMs use shorter bitlines with fewer cells, but have a higher cost-per-bit due to greater sense amplifier area overhead. To achieve both low Latency and low cost per bit, we introduce Tiered-Latency DRAM (TL-DRAM). In TL-DRAM, each long bitline is split into two shorter segments by an isolation transistor, allowing one of the two segments to be accessed with the Latency of a short-bitline DRAM without incurring a high cost per bit. We propose mechanisms that use the low-Latency segment as a hardware-managed or software-managed cache. Our evaluations show that our proposed mechanisms improve both performance and energy efficiency for both single-core and multiprogrammed workloads. Tiered-Latency DRAM has inspired several other works on reducing DRAM Latency with little to no architectural modification.

  • Tiered-Latency DRAM: A low Latency and low cost DRAM architecture
    Proceedings - International Symposium on High-Performance Computer Architecture, 2013
    Co-Authors: Donghyuk Lee, Yoongu Kim, Jamie Liu, Lavanya Subramanian, V. Seshadri, Onur Mutlu
    Abstract:

    The capacity and cost-per-bit of DRAM have historically scaled to satisfy the needs of increasingly large and complex computer systems. However, DRAM Latency has remained almost constant, making memory Latency the performance bottleneck in today's systems. We observe that the high access Latency is not intrinsic to DRAM, but a trade-off made to decrease cost-per-bit. To mitigate the high area overhead of DRAM sensing structures, commodity DRAMs connect many DRAM cells to each sense-amplifier through a wire called a bitline. These bitlines have a high parasitic capacitance due to their long length, and this bitline capacitance is the dominant source of DRAM Latency. Specialized low-Latency DRAMs use shorter bitlines with fewer cells, but have a higher cost-per-bit due to greater sense-amplifier area overhead. In this work, we introduce Tiered-Latency DRAM (TL-DRAM), which achieves both low Latency and low cost-per-bit. In TL-DRAM, each long bitline is split into two shorter segments by an isolation transistor, allowing one segment to be accessed with the Latency of a short-bitline DRAM without incurring high cost-per-bit. We propose mechanisms that use the low-Latency segment as a hardware-managed or software-managed cache. Evaluations show that our proposed mechanisms improve both performance and energy-efficiency for both single-core and multi-programmed workloads.

  • HPCA - Tiered-Latency DRAM: A low Latency and low cost DRAM architecture
    2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), 2013
    Co-Authors: Donghyuk Lee, Yoongu Kim, Jamie Liu, Lavanya Subramanian, V. Seshadri, Onur Mutlu
    Abstract:

    The capacity and cost-per-bit of DRAM have historically scaled to satisfy the needs of increasingly large and complex computer systems. However, DRAM Latency has remained almost constant, making memory Latency the performance bottleneck in today's systems. We observe that the high access Latency is not intrinsic to DRAM, but a trade-off made to decrease cost-per-bit. To mitigate the high area overhead of DRAM sensing structures, commodity DRAMs connect many DRAM cells to each sense-amplifier through a wire called a bitline. These bitlines have a high parasitic capacitance due to their long length, and this bitline capacitance is the dominant source of DRAM Latency. Specialized low-Latency DRAMs use shorter bitlines with fewer cells, but have a higher cost-per-bit due to greater sense-amplifier area overhead. In this work, we introduce Tiered-Latency DRAM (TL-DRAM), which achieves both low Latency and low cost-per-bit. In TL-DRAM, each long bitline is split into two shorter segments by an isolation transistor, allowing one segment to be accessed with the Latency of a short-bitline DRAM without incurring high cost-per-bit. We propose mechanisms that use the low-Latency segment as a hardware-managed or software-managed cache. Evaluations show that our proposed mechanisms improve both performance and energy-efficiency for both single-core and multi-programmed workloads.