Associative Cache

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 2337 Experts worldwide ranked by ideXlab platform

Chuanjun Zhang - One of the best experts on this subject based on the ideXlab platform.

  • 4.3 A Way-Halting Cache for Low-Energy High-Performance Systems
    2013
    Co-Authors: Chuanjun Zhang, Frank Vahid, Jun Yang, Walid Najjar
    Abstract:

    Caches contribute to much of a microprocessor system's power and energy consumption. We have developed a new Cache architecture, called a way-halting Cache, that reduces energy while imposing no performance overhead. Our way-halting Cache is a four-way set-Associative Cache that stores the four lowest-order bits of all ways ’ tags into a fully Associative memory, which we call the halt tag array. The lookup in the halt tag array is done in parallel with, and is no slower than, the set-index decoding. The halt tag array pre-determines which tags cannot match due to their low-order four bits mismatching. Further accesses to ways with known mismatching tags are then halted, thus saving power. Our halt tag array has an additional feature of using static logic only, rather than dynamic logic used in highly Associative Caches. We provide data from experiments on 17 benchmarks drawn from MediaBench and Spec 2000, based on our layouts in 0.18 micron CMOS technology. On average, 55 % savings of memory-access related energy were obtained over a conventional four-way set-Associative Cache. We show that energy savings are greater than previous methods, and nearly twice that of highly-Associative Caches, while imposing no performance overhead and only 2 % Cache area overhead

  • AWay-Halting Cache for Low-Energy High-Performance Systems
    2008
    Co-Authors: Chuanjun Zhang, Frank Vahid, Jun Yang, Walid Najjar
    Abstract:

    Caches contribute to much of a microprocessor system’s power and energy consumption. Numerous new Cache architectures, such as phased, pseudo-set-Associative, way predicting, reactiveAssociative, way-shutdown, way-concatenating, and highly-Associative, are intended to reduce power and/or energy, but they all impose some performance overhead. We have developed a new Cache architecture, called a way-halting Cache, that reduces energy further than previously mentioned architectures, while imposing no performance overhead. Our way-halting Cache is a four-way set-Associative Cache that stores the four lowest-order bits of all ways ’ tags into a fully Associative memory, which we call the halt tag array. The lookup in the halt tag array is done in parallel with, and is no slower than, the set-index decoding. The halt tag array predetermines which tags cannot match due to their low-order 4 bits mismatching. Further accesses to ways with known mismatching tags are then halted, thus saving power. Our halt tag array has an additional feature of using static logic only, rather than dynamic logic used in highly Associative Caches, making our Cache simpler to design with existing tools. We provide data from experiments on 29 benchmarks drawn from Powerstone, Mediabench, and Spec 2000, based on our layouts in 0.18 micron CMOS technology. On average, we obtained 55 % savings of memory-access related energy over a conventiona

  • an efficient direct mapped instruction Cache for application specific embedded systems
    International Conference on Hardware Software Codesign and System Synthesis, 2005
    Co-Authors: Chuanjun Zhang
    Abstract:

    Caches may consume half of a microprocessor's total power and Cache misses incur accessing off-chip memory, which is both time consuming and energy costly. Therefore, minimizing Cache power consumption and reducing Cache misses are important to reduce total energy consumption of embedded systems. Direct mapped Caches consume much less power than that of same sized set Associative Caches but with a poor hit rate on average. Through experiments, we observe that memory space of direct mapped instruction Caches is not used efficiently in most embedded applications. We design an efficient Cache - a configurable instruction Cache that can be tuned to utilize the Cache sets efficiently for a particular application such that Cache memory is exploited more efficiently by index remapping. Experiments on 11 benchmarks drawn from Mediabench show that the efficient Cache achieves almost the same miss rate as a conventional two-way set Associative Cache on average and with total memory-access energy savings of 30% compared with a conventional two-way set Associative Cache.

Jih-kwon Peir - One of the best experts on this subject based on the ideXlab platform.

  • Abstract Capturing Dynamic Memory Reference Behavior with Adaptive Cache Topology
    2008
    Co-Authors: Jih-kwon Peir, Yongjoon Lee, Windsor W. Hsut
    Abstract:

    Memory references exhibit locality and are therefore not uni-formly distributed across the sets of a Cache. This skew re-duces the effectiveness of a Cache because it results in the caching of a considerable number of less-recently-used lines which are less likely to be re-referenced before they are re-placed. In this paper, we describe a technique that dynami-cally identifies these less-recently-used lines and effectively utilizes the Cache frames they occupy to more accurately ap-proximate the global least-recently-used replacement policy while maintaining the fast access time of a direct-mapped Cache. We also explore the idea of using these underutilized Cache frames to reduce Cache misses through data prefetch-ing. In the proposed design, the possible locations that a line can reside in is not predetermined. Instead, the Cache is dynamically partitioned into groups of Cache lines. Be-cause both the total number of groups and the individual group associativity adapt to the dynamic reference pattern, we call this design the adaptive group-Associative Cache. Performance evaluation using trace-driven simulations of the TPC-C benchmark and selected programs from the SPEC95 benchmark suite shows that the group-Associative Cache is able to achieve a hit ratio that is consistently better than that of a 4-way set-Associative Cache. For some of the workloads, the hit ratio approaches that of a fully-Associative Cache.

  • capturing dynamic memory reference behavior with adaptive Cache topology
    Architectural Support for Programming Languages and Operating Systems, 1998
    Co-Authors: Jih-kwon Peir
    Abstract:

    Memory references exhibit locality and are therefore not uniformly distributed across the sets of a Cache. This skew reduces the effectiveness of a Cache because it results in the caching of a considerable number of less-recently-used lines which are less likely to be re-referenced before they are replaced. In this paper, we describe a technique that dynamically identifies these less-recently-used lines and effectively utilizes the Cache frames they occupy to more accurately approximate the global least-recently-used replacement policy while maintaining the fast access time of a direct-mapped Cache. We also explore the idea of using these underutilized Cache frames to reduce Cache misses through data prefetching. In the proposed design, the possible locations that a line can reside in is not predetermined. Instead, the Cache is dynamically partitioned into groups of Cache lines. Because both the total number of groups and the individual group associativity adapt to the dynamic reference pattern, we call this design the adaptive group-Associative Cache. Performance evaluation using trace-driven simulations of the TPC-C benchmark and selected programs from the SPEC95 benchmark suite shows that the group-Associative Cache is able to achieve a hit ratio that is consistently better than that of a 4-way set-Associative Cache. For some of the workloads, the hit ratio approaches that of a fully-Associative Cache.

Hai Jin - One of the best experts on this subject based on the ideXlab platform.

  • trade off between hit rate and hit latency for optimizing dram Cache
    IEEE Transactions on Emerging Topics in Computing, 2021
    Co-Authors: Pai Chen, Jianhui Yue, Xiaofei Liao, Hai Jin
    Abstract:

    Due to the large storage capacity, high bandwidth and low latency, 3D DRAM is proposed to be the last level Cache, referred to as DRAM Cache. The hit rate and hit latency are two conflicting optimization goals for DRAM Cache. To address this issue, we design a new DRAM organization that trades the lower hit rate for shorter hit latency by way-locator Cache and novel Cache set layout. We have designed a novel DRAM Cache organization to simultaneously achieve a good hit rate and shorter latency, referred to as SODA-Cache. The SODA-Cache adapts 2-way set associate Cache motivated by the observation that 2-way set Associative Cache provides the most hit rate improvement from the direct-mapped Cache to highly Associative Cache. The proposed way-locator Cache and a novel set layout effectively reduce the Cache-hit latency. We use SPEC 2006 CPU benchmark to evaluate our design and two stat-of-art DRAM Cache designs. Experimental results show that SODA-Cache can improve hit rate by 8.1 percent compared with Alloy-Cache and reduce average access latency by 23.1, 13.2 and 8.6 percent compared with LH-Cache, Alloy-Cache and ATCache respectively on average. Accordingly, SODA-Cache outperforms over LH-Cache, Alloy-Cache and ATCache by on average 17, 12.8 and 8.4 percent respectively in term of weighted speedup.

Moinuddin K Qureshi - One of the best experts on this subject based on the ideXlab platform.

  • mirage mitigating conflict based Cache attacks with a practical fully Associative design
    USENIX Security Symposium, 2021
    Co-Authors: Gururaj Saileshwar, Moinuddin K Qureshi
    Abstract:

    Shared processor Caches are vulnerable to conflict-based side-channel attacks, where an attacker can monitor access patterns of a victim by evicting victim Cache lines using Cache-set conflicts. Recent mitigations propose randomized mapping of addresses to Cache lines to obfuscate the locations of set-conflicts. However, these are vulnerable to new attacks that discover conflicting sets of addresses despite such mitigations, because these designs select eviction-candidates from a small set of conflicting lines. This paper presents Mirage, a practical design for a fully Associative Cache, wherein eviction candidates are selected randomly from all lines resident in the Cache, to be immune to set-conflicts. A key challenge for enabling such designs in large shared Caches (containing tens of thousands of Cache lines) is the complexity of Cache-lookup, as a naive design can require searching through all the resident lines. Mirage achieves full-associativity while retaining practical set-Associative lookups by decoupling placement and replacement, using pointer-based indirection from tag-store to data-store to allow a newly installed address to globally evict the data of any random resident line. To eliminate set-conflicts, Mirage provisions extra invalid tags in a skewed-Associative tag-store design where lines can be installed without set-conflict, along with a load-aware skew-selection policy that guarantees the availability of sets with invalid tags. Our analysis shows Mirage provides the global eviction property of a fully-Associative Cache throughout system lifetime (violations of full-associativity, i.e. set-conflicts, occur less than once in 10^4 to 10^17 years), thus offering a principled defense against any eviction-set discovery and any potential conflict based attacks. Mirage incurs limited slowdown (2%) and 17-20% extra storage compared to a non-secure Cache.

  • The V-Way Cache: Demand-based associativity via global replacement
    Proceedings - International Symposium on Computer Architecture, 2005
    Co-Authors: Moinuddin K Qureshi, David Thompson, Yale N. Patt
    Abstract:

    As processor speeds increase and memory latency becomes more critical, intelligent design and management of secondary Caches becomes increasingly important. The efficiency of current set-Associative Caches is reduced because programs exhibit a non-uniform distribution of memory accesses across different Cache sets. We propose a technique to vary the associativity of a Cache on a per-set basis in response to the demands of the program. By increasing the number of tag-store entries relative to the number of data lines, we achieve the performance benefit of global replacement while maintaining the constant hit latency of a set-Associative Cache. The proposed variable-way, or V-Way, set-Associative Cache achieves an average miss rate reduction of 13% on sixteen benchmarks from the SPEC CPU2000 suite. This translates into an average IPC improvement of 8%.

Pai Chen - One of the best experts on this subject based on the ideXlab platform.

  • trade off between hit rate and hit latency for optimizing dram Cache
    IEEE Transactions on Emerging Topics in Computing, 2021
    Co-Authors: Pai Chen, Jianhui Yue, Xiaofei Liao, Hai Jin
    Abstract:

    Due to the large storage capacity, high bandwidth and low latency, 3D DRAM is proposed to be the last level Cache, referred to as DRAM Cache. The hit rate and hit latency are two conflicting optimization goals for DRAM Cache. To address this issue, we design a new DRAM organization that trades the lower hit rate for shorter hit latency by way-locator Cache and novel Cache set layout. We have designed a novel DRAM Cache organization to simultaneously achieve a good hit rate and shorter latency, referred to as SODA-Cache. The SODA-Cache adapts 2-way set associate Cache motivated by the observation that 2-way set Associative Cache provides the most hit rate improvement from the direct-mapped Cache to highly Associative Cache. The proposed way-locator Cache and a novel set layout effectively reduce the Cache-hit latency. We use SPEC 2006 CPU benchmark to evaluate our design and two stat-of-art DRAM Cache designs. Experimental results show that SODA-Cache can improve hit rate by 8.1 percent compared with Alloy-Cache and reduce average access latency by 23.1, 13.2 and 8.6 percent compared with LH-Cache, Alloy-Cache and ATCache respectively on average. Accordingly, SODA-Cache outperforms over LH-Cache, Alloy-Cache and ATCache by on average 17, 12.8 and 8.4 percent respectively in term of weighted speedup.