Instruction Cache

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 10608 Experts worldwide ranked by ideXlab platform

Wei Zhang - One of the best experts on this subject based on the ideXlab platform.

  • ISORC - A Real-Time Instruction Cache with High Average-Case Performance
    2014 IEEE 17th International Symposium on Object Component Service-Oriented Real-Time Distributed Computing, 2014
    Co-Authors: Yijie Huangfu, Wei Zhang
    Abstract:

    Cache memories, while useful for improving the average-case performance for general-purpose applications, are not suitable for real-time systems due to the time unpredictability. In this paper, we propose a Performance Enhancement Guaranteed Cache (PEG-C) to ensure performance improvement in the worst case while achieving as good average-case performance as a regular hardware-controlled Cache. We design and evaluate an Instruction PEG-C as a proof-of-concept. Our experiments indicate that with a small number of preloaded Instructions, the PEG Instruction Cache can achieve the same performanceas a regular Instruction Cache, and outperform Cache locking significantly.

  • SAC - Stack distance based worst-case Instruction Cache performance analysis
    Proceedings of the 2011 ACM Symposium on Applied Computing - SAC '11, 2011
    Co-Authors: Yu Liu, Wei Zhang
    Abstract:

    The worst-case execution time (WCET) analysis is critical to ensure the schedulability and correctness of hard real-time systems. Modern microprocessors, however, make the WCET analysis complicated, mainly because of their performance acceleration features like Caches, pipelines, out-of-order execution, etc. This paper focuses on studying an accurate static timing analysis approach for Instruction Caches with the LRU-based strategy by computing the worst-case stack distance. The experimental results indicate that our approach can accurately predict worst-case Instruction Cache performance. Also, the stack distance based timing analysis approach can efficiently categorize worst-case Instruction Cache misses into cold, conflict and capacity misses, which can provide useful insights to improve the worst-case Instruction Cache performance.

  • Evaluating Instruction Cache vulnerability to transient errors
    ACM SIGARCH Computer Architecture News, 2007
    Co-Authors: Jun Yan, Wei Zhang
    Abstract:

    Recent research shows that microprocessors are increasingly susceptible to transient errors. In order to protect microprocessors cost-effectively, the first step is to accurately understand the impact of transient errors on the system reliability. While many research efforts have been focused on studying the vulnerability of data Caches and other on-chip hardware components, Instruction Caches have received less attention. However, Instructions are read every cycle, any undetected or uncorrected soft errors in Instructions can lead to erroneous computation, wrong control flow or system crash. This paper studies the Instruction Cache vulnerability by considering both the raw SRAM rate and the Cache vulnerability factor. Based on the concept of Cache vulnerability factor, we also investigate the impact of different Cache configuration parameters on the reliability of Instruction Caches. We find that on average 67.5% of Instruction Cache soft errors can be masked by the I-Cache itself without impacting other system components. While quantifying the Instruction Cache vulnerability itself does not solve the reliability problem of Instruction Cache against transient errors, we believe this work can provide useful insights for designers to develop cost-effective solutions to protect I-Caches and to optimally balance the reliability of Instruction Caches with other system goals, such as cost, performance and energy.

  • Compiler-guided next sub-bank prediction for reducing Instruction Cache leakage energy
    Journal of Embedded Computing, 2006
    Co-Authors: Wei Zhang
    Abstract:

    With the scaling of technology, leakage energy reduction has become increasingly important, especially for Cache memories. Recent studies in drowsy Instruction Cache show that the leakage energy of the Instruction Cache can be significantly reduced with little performance degradation by exploiting the Instruction spatial locality at the Cache sub-bank level [5]. To hide the performance penalty due to the sub-bank wake-up latency, a prediction buffer is used to predict and pre-activate the next sub-bank at runtime. However, consulting the prediction buffer at every Cache access consumes non-trivial dynamical energy, which can compromise the overall energy savings substantially. This paper presents a more energy-efficient compiler-guided approach to capture the sub-bank transition behavior at link time and to pre-activate the Instruction Cache sub-bank at runtime based on the compiler-directed hints. We also propose a hybrid approach to exploit both the static and dynamic information for reducing the performance penalty further with little dynamic energy overhead. Our experiments reveal that the static approach is very successful in capturing the sub-bank transition behavior to reduce the performance penalty and it also reduces 38.2% more leakage energy than the hardware-based approach, taking the dynamic energy overhead into account. Moreover, our results show that the hybrid approach is the best strategy for the drowsy Instruction Cache to balance leakage energy reduction and performance.

  • CASES - Static next sub-bank prediction for drowsy Instruction Cache
    Proceedings of the 2004 international conference on Compilers architecture and synthesis for embedded systems - CASES '04, 2004
    Co-Authors: Bramha Allu, Wei Zhang
    Abstract:

    As feature sizes shrink, leakage energy reduction has become increasingly important, especially for Cache memories. Recent research in drowsy Instruction Cache shows that the leakage energy of the Instruction Cache can be significantly reduced with little performance degradation by exploiting the Instruction spatial locality at the Cache sub-bank level[5]. The performance penalty due to the sub-bank wake-up latency is dramatically reduced by using a prediction buffer to pre-activate the next sub-bank at runtime. However, consulting the prediction buffer at every Cache access consumes non-trivial dynamical energy, which can compromise the overall energy savings substantially. This paper proposes a static approach to capture the sub-bank transition behavior at link time and to pre-activate the Instruction Cache sub-bank at runtime according to the compiler-directed hints. We also propose a hybrid approach to exploit both the static and dynamic information for reducing the performance penalty further with little dynamic energy overhead. Our experiments reveal that the static approach is very successful in capturing the sub-bank transition behavior for reducing the performance penalty and it also reduces 38.2% more leakage energy than the hardware-based approach, taking the dynamic energy overhead into account. Moreover, our results show that the hybrid approach is the best strategy for the drowsy Instruction Cache to balance leakage energy reduction and performance.

Ingjer Huang - One of the best experts on this subject based on the ideXlab platform.

  • a trace capable Instruction Cache for cost efficient real time program trace compression in soc
    IEEE Transactions on Computers, 2011
    Co-Authors: Chunhung Lai, Fuching Yang, Ingjer Huang
    Abstract:

    This paper presents a novel approach to make the on-chip Instruction Cache of a SoC to function simultaneously as a regular Instruction Cache and a real-time program trace compressor, named trace-capable Cache (TC-Cache). It is accomplished by exploiting the dictionary feature of the Instruction Cache with a small support circuit attached to the side of the Cache. Compared with related work, this work has the advantage of utilizing the existing Instruction Cache, which is indispensable in modern SoCs, and thus saves significant amount of hardware resource and power consumption. The TC-Cache can be configured to work simultaneously as the Instruction Cache and the trace compressor, named the online mode, or exclusively as the trace compressor, named the bypass mode. The RTL implementation of a 4 KB trace-capable Instruction Cache, a 4 KB data Cache, and an academic ARM processor core has been accomplished. The experiments show that the TC-Cache achieves average compression ratio of 90 percent with a very small hardware overhead of 3,652 gates (1.1 percent). It takes only 0.2 percent additional system power for the online mode operation. In addition, the trace support circuit does not impair the global critical path. Therefore, the proposed approach is a highly feasible on-chip debugging/monitoring solution for SoCs, even for cost-sensitive ones such as consumer electronics. Furthermore, the same concept can be applied to the data Cache to compress the data address trace as well.

  • a trace capable Instruction Cache for cost efficient real time program trace compression in soc
    Design Automation Conference, 2009
    Co-Authors: Chunhung Lai, Fuching Yang, Chungfu Kao, Ingjer Huang
    Abstract:

    This paper presents a novel approach to make the on-chip Instruction Cache of a SoC to function simultaneously as a regular Instruction Cache and a real time program trace compressor. This goal is accomplished by exploiting the dictionary feature of the Instruction Cache with a small support circuit attached to the side of the Cache. The trace compression works in both the bypass mode and the online mode. Compared with related work, this work has the advantage of utilizing the existing Instruction Cache, which is indispensable in modern SoCs, and thus saves significant amount of hardware resource. The RTL implementation of a 4KB trace-capable Instruction Cache, a 4KB data Cache and an academic ARM7 processor core has been accomplished. The experiments show that the Cache achieves average compression ratio of 90% with a very small hardware overhead of 3652 gates. In addition, the trace support circuit does not impact the global critical path. Therefore, the proposed approach is highly feasible on-chip debugging/monitoring solution for SoCs, even for cost sensitive ones such as consumer electronics.

  • DAC - A trace-capable Instruction Cache for cost efficient real-time program trace compression in SoC
    Proceedings of the 46th Annual Design Automation Conference on ZZZ - DAC '09, 2009
    Co-Authors: Chunhung Lai, Fuching Yang, Chungfu Kao, Ingjer Huang
    Abstract:

    This paper presents a novel approach to make the on-chip Instruction Cache of a SoC to function simultaneously as a regular Instruction Cache and a real time program trace compressor. This goal is accomplished by exploiting the dictionary feature of the Instruction Cache with a small support circuit attached to the side of the Cache. The trace compression works in both the bypass mode and the online mode. Compared with related work, this work has the advantage of utilizing the existing Instruction Cache, which is indispensable in modern SoCs, and thus saves significant amount of hardware resource. The RTL implementation of a 4KB trace-capable Instruction Cache, a 4KB data Cache and an academic ARM7 processor core has been accomplished. The experiments show that the Cache achieves average compression ratio of 90% with a very small hardware overhead of 3652 gates. In addition, the trace support circuit does not impact the global critical path. Therefore, the proposed approach is highly feasible on-chip debugging/monitoring solution for SoCs, even for cost sensitive ones such as consumer electronics.

Harmon - One of the best experts on this subject based on the ideXlab platform.

  • RTSS - Bounding worst-case Instruction Cache performance
    Proceedings Real-Time Systems Symposium REAL-94, 1994
    Co-Authors: Arnold, Mueller, Whalley, Harmon
    Abstract:

    The use of Caches poses a difficult tradeoff for architects of real-time systems. While Caches provide significant performance advantages, they have also been viewed as inherently unpredictable, since the behavior of a Cache reference depends upon the history of the previous references. The use of Caches is only suitable for real-time systems if a reasonably tight bound on the performance of programs using Cache memory can be predicted. This paper describes an approach for bounding the worst-case Instruction Cache performance of large code segments. First, a new method called static Cache simulation is used to analyze a program's control flow to statically categorize the caching behavior of each Instruction. A timing analyzer, which uses the categorization information, then estimates the worst-case Instruction Cache performance for each loop and function in the program. >

Chunhung Lai - One of the best experts on this subject based on the ideXlab platform.

  • a trace capable Instruction Cache for cost efficient real time program trace compression in soc
    IEEE Transactions on Computers, 2011
    Co-Authors: Chunhung Lai, Fuching Yang, Ingjer Huang
    Abstract:

    This paper presents a novel approach to make the on-chip Instruction Cache of a SoC to function simultaneously as a regular Instruction Cache and a real-time program trace compressor, named trace-capable Cache (TC-Cache). It is accomplished by exploiting the dictionary feature of the Instruction Cache with a small support circuit attached to the side of the Cache. Compared with related work, this work has the advantage of utilizing the existing Instruction Cache, which is indispensable in modern SoCs, and thus saves significant amount of hardware resource and power consumption. The TC-Cache can be configured to work simultaneously as the Instruction Cache and the trace compressor, named the online mode, or exclusively as the trace compressor, named the bypass mode. The RTL implementation of a 4 KB trace-capable Instruction Cache, a 4 KB data Cache, and an academic ARM processor core has been accomplished. The experiments show that the TC-Cache achieves average compression ratio of 90 percent with a very small hardware overhead of 3,652 gates (1.1 percent). It takes only 0.2 percent additional system power for the online mode operation. In addition, the trace support circuit does not impair the global critical path. Therefore, the proposed approach is a highly feasible on-chip debugging/monitoring solution for SoCs, even for cost-sensitive ones such as consumer electronics. Furthermore, the same concept can be applied to the data Cache to compress the data address trace as well.

  • a trace capable Instruction Cache for cost efficient real time program trace compression in soc
    Design Automation Conference, 2009
    Co-Authors: Chunhung Lai, Fuching Yang, Chungfu Kao, Ingjer Huang
    Abstract:

    This paper presents a novel approach to make the on-chip Instruction Cache of a SoC to function simultaneously as a regular Instruction Cache and a real time program trace compressor. This goal is accomplished by exploiting the dictionary feature of the Instruction Cache with a small support circuit attached to the side of the Cache. The trace compression works in both the bypass mode and the online mode. Compared with related work, this work has the advantage of utilizing the existing Instruction Cache, which is indispensable in modern SoCs, and thus saves significant amount of hardware resource. The RTL implementation of a 4KB trace-capable Instruction Cache, a 4KB data Cache and an academic ARM7 processor core has been accomplished. The experiments show that the Cache achieves average compression ratio of 90% with a very small hardware overhead of 3652 gates. In addition, the trace support circuit does not impact the global critical path. Therefore, the proposed approach is highly feasible on-chip debugging/monitoring solution for SoCs, even for cost sensitive ones such as consumer electronics.

  • DAC - A trace-capable Instruction Cache for cost efficient real-time program trace compression in SoC
    Proceedings of the 46th Annual Design Automation Conference on ZZZ - DAC '09, 2009
    Co-Authors: Chunhung Lai, Fuching Yang, Chungfu Kao, Ingjer Huang
    Abstract:

    This paper presents a novel approach to make the on-chip Instruction Cache of a SoC to function simultaneously as a regular Instruction Cache and a real time program trace compressor. This goal is accomplished by exploiting the dictionary feature of the Instruction Cache with a small support circuit attached to the side of the Cache. The trace compression works in both the bypass mode and the online mode. Compared with related work, this work has the advantage of utilizing the existing Instruction Cache, which is indispensable in modern SoCs, and thus saves significant amount of hardware resource. The RTL implementation of a 4KB trace-capable Instruction Cache, a 4KB data Cache and an academic ARM7 processor core has been accomplished. The experiments show that the Cache achieves average compression ratio of 90% with a very small hardware overhead of 3652 gates. In addition, the trace support circuit does not impact the global critical path. Therefore, the proposed approach is highly feasible on-chip debugging/monitoring solution for SoCs, even for cost sensitive ones such as consumer electronics.

Arnold - One of the best experts on this subject based on the ideXlab platform.

  • RTSS - Bounding worst-case Instruction Cache performance
    Proceedings Real-Time Systems Symposium REAL-94, 1994
    Co-Authors: Arnold, Mueller, Whalley, Harmon
    Abstract:

    The use of Caches poses a difficult tradeoff for architects of real-time systems. While Caches provide significant performance advantages, they have also been viewed as inherently unpredictable, since the behavior of a Cache reference depends upon the history of the previous references. The use of Caches is only suitable for real-time systems if a reasonably tight bound on the performance of programs using Cache memory can be predicted. This paper describes an approach for bounding the worst-case Instruction Cache performance of large code segments. First, a new method called static Cache simulation is used to analyze a program's control flow to statically categorize the caching behavior of each Instruction. A timing analyzer, which uses the categorization information, then estimates the worst-case Instruction Cache performance for each loop and function in the program. >