Main Memory Access

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 246 Experts worldwide ranked by ideXlab platform

Martin Schoeberl - One of the best experts on this subject based on the ideXlab platform.

  • SEUS - A Single-Path Chip-Multiprocessor System
    Software Technologies for Embedded and Ubiquitous Systems, 2009
    Co-Authors: Martin Schoeberl, Peter Puschner, Raimund Kirner
    Abstract:

    In this paper we explore the combination of a time-predictable chip-multiprocessor system with the single-path programming paradigm. Time-sliced arbitration of the Main Memory Access provides time-predictable Memory load and store instructions. Single-path programming avoids control flow dependent timing variations. To keep the execution time of tasks constant, even in the case of shared Memory Access of several processor cores, the tasks on the cores are synchronized with the time-sliced Memory arbitration unit.

  • OTM Workshops - A Time Predictable Instruction Cache for a Java Processor
    Lecture Notes in Computer Science, 2004
    Co-Authors: Martin Schoeberl
    Abstract:

    Cache memories are mandatory to bridge the growing gap between CPU speed and Main Memory Access time. Standard cache organizations improve the average execution time but are difficult to predict for worst case execution time (WCET) analysis. This paper proposes a different cache architecture, intended to ease WCET analysis. The cache stores complete methods and cache misses occur only on method invocation and return. Cache block replacement depends on the call tree, instead of instruction addresses.

Stefan Manegold - One of the best experts on this subject based on the ideXlab platform.

  • breaking the Memory wall in monetdb
    Communications of The ACM, 2008
    Co-Authors: Peter Boncz, Martin L Kersten, Stefan Manegold
    Abstract:

    In the past decades, advances in speed of commodity CPUs have far outpaced advances in RAM latency. Main-Memory Access has therefore become a performance bottleneck for many computer applications; a phenomenon that is widely known as the "Memory wall." In this paper, we report how research around the MonetDB database system has led to a redesign of database architecture in order to take advantage of modern hardware, and in particular to avoid hitting the Memory wall. This encompasses (i) a redesign of the query execution model to better exploit pipelined CPU architectures and CPU instruction caches; (ii) the use of columnar rather than row-wise data storage to better exploit CPU data caches; (iii) the design of new cache-conscious query processing algorithms; and (iv) the design and automatic calibration of Memory cost models to choose and tune these cache-conscious algorithms in the query optimizer.

  • Optimizing database architecture for the new bottleneck: Memory Access
    The VLDB Journal, 2000
    Co-Authors: Stefan Manegold, Peter Boncz, Martin L Kersten
    Abstract:

    In the past decade, advances in the speed of commodity CPUs have far out-paced advances in Memory latency. Main-Memory Access is therefore increasingly a performance bottleneck for many computer applications, including database systems. In this article, we use a simple scan test to show the severe impact of this bottleneck. The insights gained are translated into guidelines for database architecture, in terms of both data structures and algorithms. We discuss how vertically fragmented data structures optimize cache performance on sequential data Access. We then focus on equi-join, typically a random-Access operation, and introduce radix algorithms for partitioned hash-join. The performance of these algorithms is quantified using a detailed analytical model that incorporates Memory Access cost. Experiments that validate this model were performed on the Monet database system. We obtained exact statistics on events such as TLB misses and L1 and L2 cache misses by using hardware performance counters found in modern CPUs. Using our cost model, we show how the carefully tuned Memory Access pattern of our radix algorithms makes them perform well, which is confirmed by experimental results.

Kisaburo Nakazawa - One of the best experts on this subject based on the ideXlab platform.

  • ISC Processor with Pseudo Vector Processing Feature
    1995
    Co-Authors: Kotaro Shimamura, S. Tanaka, Tetsuya Shimomura, Takashi Hotta, E. Kamada, Hideo Sawamoto, T. Shimizu, Kisaburo Nakazawa
    Abstract:

    A novel architectural extension, in which floatingpoint a&t are transterred directly from Main Memory to jloating-point registers, has been successfully implemented in a superscalar RISC processor. This extension allows Main Memory Access throughput of I .2 Gbytels, and efective perjonnance reaches 267 MPLOPS (89% of the peakper$ormance) for typical floating-point applications. The processor utilizes 0.3-micron 4-level metal CMOS technology with 2.5 V power supply and contains 3.9 million transistors in 15.7 mm x 15.7 mm die size. Only 4.5% of the die area is used for the extension. Pipeline stage optimization and swreboardbased dependency check method allow the extension to be realized without affecting the operating j?equency.

  • HICSS (1) - Evaluation of pseudo vector processor based on slide-windowed registers
    Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences HICSS-94, 1994
    Co-Authors: Hiroshi Nakamura, Hiromitsu Imori, Y. Yamashita, Kisaburo Nakazawa, Taisuke Boku, Ikuo Nakata
    Abstract:

    We present a new scalar processor for high-speed vector processing and its evaluation. The proposed processor can hide long Main Memory Access latency by introducing slide-windowed floating-point registers with data preloading feature and pipelined Memory. Owing to the slide-window structure, the proposed processor can utilize more floating-point registers in keeping upward compatibility with existing scalar architecture. We have evaluated its performance on Livermore Fortran Kernels. The evaluation results show that the proposed processor drastically reduces the penalty of Main Memory Access compared with an ordinary scalar processor. For example, the proposed processor with 96 registers hides Memory Access latency of 70 CPU cycles when the throughput of Main Memory is 8 byte/cycle. From these results, it is concluded that the proposed architecture is very suitable for high-speed vector processing. >

  • International Conference on Supercomputing - A scalar architecture for pseudo vector processing based on slide-windowed registers
    Proceedings of the 7th international conference on Supercomputing - ICS '93, 1993
    Co-Authors: Hiroshi Nakamura, Hiromitsu Imori, Kisaburo Nakazawa, Taisuke Boku, Ikuo Nakata, Hideo Wada, Yasuhiro Inagami, Y. Yamashita
    Abstract:

    In this paper, we present a new scalar architecture for high-speed vector processing. Without using cache Memory, the proposed architecture tolerates Main Memory Access latency by introducing slide-windowed floating-point registers with data preloading feature and pipelined Memory. The architecture can hold upward compatibility with existing scalar architectures. In the new architecture, software can control the window structure. This is the advantage compared with our previous work of register-windows. Because of this advantage, registers are utilized more flexibly and computational efficiency is largely enhanced. Furthermore, this flexibility helps the compiler to generate efficient object codes easily. We have evaluated its performance on Livermore Fortran Kernels. The evaluation results show that the proposed architecture reduces the penalty of Main Memory Access better than an ordinary scalar processor and a processor with cache prefetching. The proposed architecture with 64 registers tolerates Memory Access latency of 30 CPU cyles. Compared with our previous work, the proposed architecture hides longer Memory Access latency with fewer registers.

  • ICCD - A superscalar RISC processor with pseudo vector processing feature
    Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors, 1
    Co-Authors: Kotaro Shimamura, S. Tanaka, Tetsuya Shimomura, Takashi Hotta, E. Kamada, Hideo Sawamoto, T. Shimizu, Kisaburo Nakazawa
    Abstract:

    A novel architectural extension, in which floating-point data are transferred directly from Main Memory to floating-point registers, has been successfully implemented in a superscalar RISC processor. This extension allows Main Memory Access throughput of 1.2 Gbyte/s, and effective performance reaches 267 MFLOPS (89% of the peak performance) for typical floating-point applications. The processor utilizes 0.3-micron 4-level metal CMOS technology with 2.5 V power supply and contains 3.9 million transistors in 15.7 mm/spl times/15.7 mm die size. Only 4.5% of the die area is used for the extension. Pipeline stage optimization and scoreboard-based dependency check method allow the extension to be realized without affecting the operating frequency.

Dean M. Tullsen - One of the best experts on this subject based on the ideXlab platform.

  • HPCA - MemPod: A Clustered Architecture for Efficient and Scalable Migration in Flat Address Space Multi-level Memories
    2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2017
    Co-Authors: Andreas Prodromou, Mitesh R. Meswani, Nuwan Jayasena, Gabriel H. Loh, Dean M. Tullsen
    Abstract:

    In the near future, die-stacked DRAM will be increasingly present in conjunction with off-chip memories in hybrid Memory systems. Research on this subject revolves around using the stacked Memory as a cache or as part of a flat address space. This paper proposes MemPod, a scalable and efficient Memory management mechanism for flat address space hybrid memories. MemPod monitors Memory activity and periodically migrates the most frequently Accessed Memory pages to the faster on-chip Memory. MemPod's partitioned architectural organization allows for efficientscaling with Memory system capabilities. Further, a big data analytics algorithm is adapted to develop an efficient, low-cost activity tracking technique. MemPod improves the average Main Memory Access time of multi-programmed workloads, by up to 29% (9% on average) compared to the state of the art, and that will increase as the differential between Memory speeds widens. MemPod's novel activity tracking approach leads to significant cost reduction (12800x lower storage space requirements) and improved future prediction accuracy over prior work which Maintains a separatecounter per page.

Peter Boncz - One of the best experts on this subject based on the ideXlab platform.

  • breaking the Memory wall in monetdb
    Communications of The ACM, 2008
    Co-Authors: Peter Boncz, Martin L Kersten, Stefan Manegold
    Abstract:

    In the past decades, advances in speed of commodity CPUs have far outpaced advances in RAM latency. Main-Memory Access has therefore become a performance bottleneck for many computer applications; a phenomenon that is widely known as the "Memory wall." In this paper, we report how research around the MonetDB database system has led to a redesign of database architecture in order to take advantage of modern hardware, and in particular to avoid hitting the Memory wall. This encompasses (i) a redesign of the query execution model to better exploit pipelined CPU architectures and CPU instruction caches; (ii) the use of columnar rather than row-wise data storage to better exploit CPU data caches; (iii) the design of new cache-conscious query processing algorithms; and (iv) the design and automatic calibration of Memory cost models to choose and tune these cache-conscious algorithms in the query optimizer.

  • Optimizing database architecture for the new bottleneck: Memory Access
    The VLDB Journal, 2000
    Co-Authors: Stefan Manegold, Peter Boncz, Martin L Kersten
    Abstract:

    In the past decade, advances in the speed of commodity CPUs have far out-paced advances in Memory latency. Main-Memory Access is therefore increasingly a performance bottleneck for many computer applications, including database systems. In this article, we use a simple scan test to show the severe impact of this bottleneck. The insights gained are translated into guidelines for database architecture, in terms of both data structures and algorithms. We discuss how vertically fragmented data structures optimize cache performance on sequential data Access. We then focus on equi-join, typically a random-Access operation, and introduce radix algorithms for partitioned hash-join. The performance of these algorithms is quantified using a detailed analytical model that incorporates Memory Access cost. Experiments that validate this model were performed on the Monet database system. We obtained exact statistics on events such as TLB misses and L1 and L2 cache misses by using hardware performance counters found in modern CPUs. Using our cost model, we show how the carefully tuned Memory Access pattern of our radix algorithms makes them perform well, which is confirmed by experimental results.