Capacity Miss

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 39 Experts worldwide ranked by ideXlab platform

Rahman Lavaee - One of the best experts on this subject based on the ideXlab platform.

  • The hardness of data packing
    ACM SIGPLAN Notices, 2016
    Co-Authors: Rahman Lavaee
    Abstract:

    A program can benefit from improved cache block utilization when contemporaneously accessed data elements are placed in the same memory block. This can reduce the program's memory block working set and thereby, reduce the Capacity Miss rate. We formally define the problem of data packing for arbitrary number of blocks in the cache and packing factor (the number of data objects fitting in a cache block) and study how well the optimal solution can be approximated for two dual problems. On the one hand, we show that the cache hit maximization problem is approximable within a constant factor, for every fixed number of blocks in the cache. On the other hand, we show that unless P=NP, the cache Miss minimization problem cannot be efficiently approximated.

  • POPL - The hardness of data packing
    Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2016
    Co-Authors: Rahman Lavaee
    Abstract:

    A program can benefit from improved cache block utilization when contemporaneously accessed data elements are placed in the same memory block. This can reduce the program's memory block working set and thereby, reduce the Capacity Miss rate. We formally define the problem of data packing for arbitrary number of blocks in the cache and packing factor (the number of data objects fitting in a cache block) and study how well the optimal solution can be approximated for two dual problems. On the one hand, we show that the cache hit maximization problem is approximable within a constant factor, for every fixed number of blocks in the cache. On the other hand, we show that unless P=NP, the cache Miss minimization problem cannot be efficiently approximated.

Yingchao Zhou - One of the best experts on this subject based on the ideXlab platform.

  • IPDPS - Impact of page size on communication performance
    19th IEEE International Parallel and Distributed Processing Symposium, 1
    Co-Authors: Xiaocheng Zhou, Zhigang Huo, Ninghui Sun, Yingchao Zhou
    Abstract:

    In this paper, the impact of page size on the communication performance is studied. In the interconnection communication of cluster system, the address translation table (ATT), which is located in the memory of the network interface card (NIC) and can in a way be seen as the translation look-aside buffer (TLB) used by the NIC processor, is usually used to translate virtual address to physical address by NIC. The page size of operating system not only affects the compulsory and Capacity Miss rate, but also the hit time and the Miss penalty of ATT in some implementations. With a large page size, we can get lower ATT Miss rate, shorter hit time and Miss penalty to improve the communication performance. To test the impact of the page size, a Linux module based on AMD Opteron/spl trade/ processor is implemented to allocate both normal pages and super pages and the address translation mechanism in Myrinet GM is also extended to support either normal pages or super pages. With super pages, the latency of Ping-pong test can be reduced 4.3 us and the bandwidth can improve 55.3 MB/s in some case. The Linpack test results of 11 TFLOPS Dawning 4000A show that the Linpack efficiency can be increased from 0.66% to 2.86% for different number of processors.

Saudi Arabia - One of the best experts on this subject based on the ideXlab platform.

  • Cache Coherence Mechanisms
    2015
    Co-Authors: Almakdi, Abdulwahab Alazeb, Mohammed Alshehri, Saudi Arabia
    Abstract:

    Many modern computing architectures that utilize dedicated caches rely on coherency mechanisms to maintain consistency across dedicated caches (2). These mechanisms, which are to be the focus of this paper, rely on underlying hardware synchronicity to resolve the issue of the value of a particular piece of data at a given instant of time based on the discrete way in which processor instructions are executed with each clock cycle with corresponding memory accesses following(2), (4). It should be clear that inconsistencies occur when data is written and noticed when data is read. It is the goal of this paper to explore the idiosyncrasies of the coherence mechanisms involved with dedicated caches via researching two common types of mechanisms, snoop-based and directory-based, and simulating their operations on top of a simulated architecture consisting of multiple processing cores and a layered cache system consisting of dedicated caches. In this paper, we implemented snoopy and directory protocols, and measure hit rate, compulsory Miss rate, Capacity Miss rate, and coherence forces for each one. In addition, we show that how each scheme affected by block size and cache size.

Kai-feng Wang - One of the best experts on this subject based on the ideXlab platform.

  • Path-based next N trace prefetch in trace processors
    Microprocessors and Microsystems, 2005
    Co-Authors: Kai-feng Wang
    Abstract:

    Abstract The performance of trace processor rests with trace cache efficiency to a great extent. Higher trace cache Miss rate will reduce performance significantly because no traces can be dispatched to the back-end PEs when trace cache Miss occurs until the completion of Missing trace construction. When running large work set application, higher Capacity Miss rate is inevitable for the relatively small Capacity of trace cache. With the ever-increasing conventional application scale, this problem will become more severe. Addressing to the high Capacity Miss rate, a two-level trace cache is incorporated with conventional one-level trace cache in this paper. We found that augmenting two-level trace cache can only improve performance in a limited way for the long access latency of two-level trace cache. In order to reduce the access latency of two-level trace cache, a path-based next N trace prefetch mechanism is proposed in this paper. Path-based next N trace prefetch mechanism prefetches the next N trace from current running trace with the help of path-based next N trace prediction which is an extension to the path-based next trace predictor. Simulation results show that the path-based next N trace prefetch mechanism with prefetch distance three attains 11.3% performance improvement over the conventional one-level trace cache mechanism for eight SPECint95 benchmarks.

Xiaocheng Zhou - One of the best experts on this subject based on the ideXlab platform.

  • IPDPS - Impact of page size on communication performance
    19th IEEE International Parallel and Distributed Processing Symposium, 1
    Co-Authors: Xiaocheng Zhou, Zhigang Huo, Ninghui Sun, Yingchao Zhou
    Abstract:

    In this paper, the impact of page size on the communication performance is studied. In the interconnection communication of cluster system, the address translation table (ATT), which is located in the memory of the network interface card (NIC) and can in a way be seen as the translation look-aside buffer (TLB) used by the NIC processor, is usually used to translate virtual address to physical address by NIC. The page size of operating system not only affects the compulsory and Capacity Miss rate, but also the hit time and the Miss penalty of ATT in some implementations. With a large page size, we can get lower ATT Miss rate, shorter hit time and Miss penalty to improve the communication performance. To test the impact of the page size, a Linux module based on AMD Opteron/spl trade/ processor is implemented to allocate both normal pages and super pages and the address translation mechanism in Myrinet GM is also extended to support either normal pages or super pages. With super pages, the latency of Ping-pong test can be reduced 4.3 us and the bandwidth can improve 55.3 MB/s in some case. The Linpack test results of 11 TFLOPS Dawning 4000A show that the Linpack efficiency can be increased from 0.66% to 2.86% for different number of processors.