Superscalar Processor

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 3585 Experts worldwide ranked by ideXlab platform

J.p. Shen - One of the best experts on this subject based on the ideXlab platform.

  • Superscalar Processor validation at the microarchitecture level
    Proceedings Twelfth International Conference on VLSI Design. (Cat. No.PR00013), 1999
    Co-Authors: N. Utamaphethai, R.d. Blanton, J.p. Shen
    Abstract:

    We describe a rigorous ATPG-like methodology for validating the branch prediction mechanism of the PowerPC604 which can be easily generalized and made applicable to other Processors. Test sequences based on finite state machine (FSM) testing are derived from small FSM-like models of the branch prediction mechanism. These sequences are translated into PowerPC instruction sequences. Simulation results show that 100% coverage of the targeted functionality is achieved using a very small number of simulation cycles. Simulation of some real programs against the same targeted functionality produces coverages that range between 34% and 75% with four orders of magnitude more cycles. We also use mutation analysis to modify some functionality of the behavioral model to further illustrate the effectiveness of our generated sequence. Simulation results show that all 54 mutants in the branch prediction functionality can be detected by measuring transition coverage.

  • A framework for statistical modeling of Superscalar Processor performance
    Proceedings Third International Symposium on High-Performance Computer Architecture, 1997
    Co-Authors: D.b. Noonburg, J.p. Shen
    Abstract:

    Presents a statistical approach to modeling Superscalar Processor performance. Standard trace-driven techniques are very accurate, but require extremely long simulation times, especially as traces reach lengths in the billions of instructions. A framework for statistical models is described which facilitates fast, accurate performance evaluation. A machine model is built up from components: buffers, pipelines, etc. Each program trace is scanned once, generating a set of program parallelism parameters which can be used across an entire family of machine models. The machine model and program parallelism parameters are combined to form a Markov chain. The Markov chain is partitioned in order to reduce the size of the state space, and the resulting linked models are solved using an iterative technique. The use of this framework is demonstrated with two simple Processor microarchitectures. The IPC estimates are very close to the IPCs generated by trace-driven simulation of the same microarchitectures. Resource utilization and other performance data can also be obtained from the statistical model.

  • Theoretical modeling of Superscalar Processor performance
    Proceedings of MICRO-27. The 27th Annual IEEE ACM International Symposium on Microarchitecture, 1994
    Co-Authors: D.b. Noonburg, J.p. Shen
    Abstract:

    The current trace-driven simulation approach to determine Superscalar Processor performance is widely used but has some shortcomings. Modern benchmarks generate extremely long traces, resulting in problems with data storage, as well as very long simulation run times. More fundamentally, simulation generally does not provide significant insight into the factors that determine performance or a characterization of their interactions. This paper proposes a theoretical model of Superscalar Processor performance that addresses these shortcomings. Performance is viewed as an interaction of program parallelism and machine parallelism. Both program and machine parallelisms are decomposed into multiple component functions. Methods for measuring or computing these functions are described. The functions are combined to provide a model of the interaction between program and machine parallelisms and an accurate estimate of the performance. The computed performance, based on this model, is compared to simulated performance for six benchmarks from the SPEC 92 suite on several configurations of the IBM RS/6000 instruction set architecture.

Eric Rotenberg - One of the best experts on this subject based on the ideXlab platform.

  • AnyCore-1: A comprehensively adaptive 4-way Superscalar Processor
    2016 IEEE Hot Chips 28 Symposium (HCS), 2016
    Co-Authors: Rangeen Basu Roy Chowdhury, Anil K. Kannepalli, Eric Rotenberg
    Abstract:

    Presents a collection of slides covering the following topics: FPGA; and AnyCore-1 Microarchitecture.

  • FPGA modeling of diverse Superscalar Processors
    2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012
    Co-Authors: Brandon H. Dwiel, Niket K. Choudhary, Eric Rotenberg
    Abstract:

    There is increasing interest in using Field Programmable Gate Arrays (FPGAs) as platforms for computer architecture simulation. This paper is concerned with modeling Superscalar Processors with FPGAs. To be transformative, the FPGA modeling framework should meet three criteria. (1) Configurable: The framework should be able to model diverse Superscalar Processors, like a software model. In particular, it should be possible to vary Superscalar parameters such as fetch, issue, and retire widths, depths of pipeline stages, queue sizes, etc. (2) Automatic: The framework should be able to automatically and efficiently map any one of its Superscalar Processor configurations to the FPGA. (3) Realistic: The framework should model a modern Superscalar microarchitecture in detail, ideally with prototype quality, to enable a new era and depth of microarchitecture research. A framework that meets these three criteria will enjoy the convenience of a software model, the speed of an FPGA model, and the experience of a prototype. This paper describes FPGA-Sim, a configurable, automatically FPGA-synthesizable, and register-transfer-level (RTL) model of an out-of-order Superscalar Processor. FPGA-Sim enables FPGA modeling of diverse Superscalar Processors out-of-the-box. Moreover, its direct RTL implementation yields the fidelity of a hardware prototype.

  • Coverage of a microarchitecture-level fault check regimen in a Superscalar Processor
    2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN), 2008
    Co-Authors: Vimal Reddy, Eric Rotenberg
    Abstract:

    Conventional Processor fault tolerance based on time/space redundancy is robust but prohibitively expensive for commodity Processors. This paper explores an unconventional approach to designing a cost-effective fault-tolerant Superscalar Processor. The idea is to engage a regimen of microarchitecture-level fault checks. A few simple microarchitecture-level fault checks can detect many arbitrary faults in large units, by observing microarchitecture-level behavior and anomalies in this behavior. Previously, we separately proposed checks for the fetch and decode stages, rename stage, and issue stage of a contemporary Superscalar Processor. While each piece hinted at the possibility of a complete regimen - for an overall fault-tolerant Superscalar Processor - this totality was not explored. This paper provides the culmination by building a full regimen into a Superscalar Processor. We show for the first time that the regimen-based approach provides substantial coverage of an entire Superscalar Processor. Analysis reveals vulnerable areas which should be the focus for regimen additions.

  • DSN - Coverage of a microarchitecture-level fault check regimen in a Superscalar Processor
    2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN), 2008
    Co-Authors: Vimal Reddy, Eric Rotenberg
    Abstract:

    Conventional Processor fault tolerance based on time/space redundancy is robust but prohibitively expensive for commodity Processors. This paper explores an unconventional approach to designing a cost-effective fault-tolerant Superscalar Processor. The idea is to engage a regimen of microarchitecture-level fault checks. A few simple microarchitecture-level fault checks can detect many arbitrary faults in large units, by observing microarchitecture-level behavior and anomalies in this behavior. Previously, we separately proposed checks for the fetch and decode stages, rename stage, and issue stage of a contemporary Superscalar Processor. While each piece hinted at the possibility of a complete regimen - for an overall fault-tolerant Superscalar Processor - this totality was not explored. This paper provides the culmination by building a full regimen into a Superscalar Processor. We show for the first time that the regimen-based approach provides substantial coverage of an entire Superscalar Processor. Analysis reveals vulnerable areas which should be the focus for regimen additions.

D.b. Noonburg - One of the best experts on this subject based on the ideXlab platform.

  • A framework for statistical modeling of Superscalar Processor performance
    Proceedings Third International Symposium on High-Performance Computer Architecture, 1997
    Co-Authors: D.b. Noonburg, J.p. Shen
    Abstract:

    Presents a statistical approach to modeling Superscalar Processor performance. Standard trace-driven techniques are very accurate, but require extremely long simulation times, especially as traces reach lengths in the billions of instructions. A framework for statistical models is described which facilitates fast, accurate performance evaluation. A machine model is built up from components: buffers, pipelines, etc. Each program trace is scanned once, generating a set of program parallelism parameters which can be used across an entire family of machine models. The machine model and program parallelism parameters are combined to form a Markov chain. The Markov chain is partitioned in order to reduce the size of the state space, and the resulting linked models are solved using an iterative technique. The use of this framework is demonstrated with two simple Processor microarchitectures. The IPC estimates are very close to the IPCs generated by trace-driven simulation of the same microarchitectures. Resource utilization and other performance data can also be obtained from the statistical model.

  • MICRO - Theoretical modeling of Superscalar Processor performance
    Proceedings of the 27th annual international symposium on Microarchitecture - MICRO 27, 1994
    Co-Authors: D.b. Noonburg, John Shen
    Abstract:

    The current trace-driven simulation approach to determine Superscalar Processor performance is widely used but has some shortcomings. Modern benchmarks generate extremely long traces, resulting in problems with data storage, as well as very long simulation run times. More fundamentally, simulation generally does not provide significant insight into the factors that determine performance or a characterization of their interactions. This paper proposes a theoretical model of Superscalar Processor performance that addresses these shortcomings. Performance is viewed as an interaction of program parallelism and machine parallelism. Both program and machine parallelisms are decomposed into multiple component functions. Methods for measuring or computing these functions are described. The functions are combined to provide a model of the interaction between program and machine parallelisms and an accurate estimate of the performance. The computed performance, based on this model, is compared to simulated performance for six benchmarks from the SPEC 92 suite on several configurations of the IBM RS/6000 instruction set architecture.

  • Theoretical modeling of Superscalar Processor performance
    Proceedings of MICRO-27. The 27th Annual IEEE ACM International Symposium on Microarchitecture, 1994
    Co-Authors: D.b. Noonburg, J.p. Shen
    Abstract:

    The current trace-driven simulation approach to determine Superscalar Processor performance is widely used but has some shortcomings. Modern benchmarks generate extremely long traces, resulting in problems with data storage, as well as very long simulation run times. More fundamentally, simulation generally does not provide significant insight into the factors that determine performance or a characterization of their interactions. This paper proposes a theoretical model of Superscalar Processor performance that addresses these shortcomings. Performance is viewed as an interaction of program parallelism and machine parallelism. Both program and machine parallelisms are decomposed into multiple component functions. Methods for measuring or computing these functions are described. The functions are combined to provide a model of the interaction between program and machine parallelisms and an accurate estimate of the performance. The computed performance, based on this model, is compared to simulated performance for six benchmarks from the SPEC 92 suite on several configurations of the IBM RS/6000 instruction set architecture.

Hans Jurgen Mattausch - One of the best experts on this subject based on the ideXlab platform.

  • performance evaluation of Superscalar Processor with multi bank register file using spec2000
    Annual Conference on Computers, 2006
    Co-Authors: Kazuya Tanigawa, Tetsuo Hironaka, Moto Maeda, Tetsuya Sueyoshi, Kenichi Aoyama, Tetsushi Koide, Hans Jurgen Mattausch
    Abstract:

    Recently, register files in highly parallel Superscalar Processors tend to have large chip area and many access ports. This trend causes problems with chip-size, access time and power consumption. As one of the approaches for solving these problems, researchers have proposed several methods using a multi-bank register file instead of multi-port register file. And we have proposed a method to achieve higher performance as compared with other methods. In this paper, we evaluate the effectiveness of our method by software simulation using SPECint2000. The results shows that a Superscalar Processor with our proposal method has only 1% performance degradation in a cycle-based comparison with a conventional multi-port register file under the condition that each register bank in multi-bank register file has two read ports and two write ports. Additionally, our method keeps only 3% performance degradation even if each bank register has only one port.

  • design of Superscalar Processor with multi bank register file
    International Symposium on Circuits and Systems, 2005
    Co-Authors: T Saito, Kazuya Tanigawa, Tetsuo Hironaka, Moto Maeda, Tetsuya Sueyoshi, Kenichi Aoyama, Tetsushi Koide, Hans Jurgen Mattausch
    Abstract:

    Recently, register files in highly parallel Superscalar Processors tend to have large chip areas and many access ports. This trend causes problems with chip-size, access time and power consumption. As one of the methods for solving these problems, we have proposed a multi-bank register file which realizes small area, high speed and low power consumption. We have proved the effectiveness of this method by simulation. We now show a detailed design of a Superscalar Processor with a multi-bank register file and its evaluation results. From the design by Verilog-HDL, the Processor with the multi-bank register file improves register access speed by 49% at the cost of 28% more gates for register-access scheduling. These results verify that we have solved the problem of shortening the critical path around the register file in highly parallel Processors.

  • ISCAS (4) - Design of Superscalar Processor with multi-bank register file
    2005 IEEE International Symposium on Circuits and Systems, 2005
    Co-Authors: T Saito, Kazuya Tanigawa, Tetsuo Hironaka, Moto Maeda, Tetsuya Sueyoshi, Kenichi Aoyama, Tetsushi Koide, Hans Jurgen Mattausch
    Abstract:

    Recently, register files in highly parallel Superscalar Processors tend to have large chip areas and many access ports. This trend causes problems with chip-size, access time and power consumption. As one of the methods for solving these problems, we have proposed a multi-bank register file which realizes small area, high speed and low power consumption. We have proved the effectiveness of this method by simulation. We now show a detailed design of a Superscalar Processor with a multi-bank register file and its evaluation results. From the design by Verilog-HDL, the Processor with the multi-bank register file improves register access speed by 49% at the cost of 28% more gates for register-access scheduling. These results verify that we have solved the problem of shortening the critical path around the register file in highly parallel Processors.

  • Superscalar Processor with multi bank register file
    Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05), 2005
    Co-Authors: Tetsuo Hironaka, Kazuya Tanigawa, Moto Maeda, Tetsuya Sueyoshi, Kenichi Aoyama, Tetsushi Koide, Hans Jurgen Mattausch, T Saito
    Abstract:

    Register files in highly parallel Superscalar Processors tend to have large chip area and many access ports. This trend causes problems with chip-size, access time and power consumption. As one of the methods for solving these problems, we have proposed a multi-bank register file which realizes small area, high speed and low power consumption. We have proved effectiveness of this method by software simulation, and by detail designing it as synthesizable Verilog-HDL description with a full custom designed multi-bank register file. In this paper, we show the detail architecture of a Superscalar Processor with the multi-bank register file and its evaluation results.

Andreas Steininger - One of the best experts on this subject based on the ideXlab platform.

  • PRDC - A Fail-Silent Reconfigurable Superscalar Processor
    13th Pacific Rim International Symposium on Dependable Computing (PRDC 2007), 2007
    Co-Authors: Thomas Kottke, Andreas Steininger
    Abstract:

    We propose a reconfigurable Superscalar Processor with two modes of operation: In safety mode the two pipelines run in lock step, executing the same instruction sequence, thus allowing to detect hardware failures. In performance mode different instruction streams are executed in parallel, just like in a standard Superscalar Processor. Considering that many embedded applications comprise a mixture of safety-critical and non safety-critical functions, the ability to dynamically switch between the two modes allows an efficient utilization of the duplicated pipeline. To complement the error detection enabled by the duplicated pipeline, non-duplicated components such as the register file are secured by parity. A systematic failure analysis shows that the proposed implementation can indeed detect all single faults in safety mode and that the ability to switch modes does not compromise the fail safe property. These encouraging results are finally confirmed by extensive fault injection experiments.

  • A Fail-Silent Reconfigurable Superscalar Processor
    13th Pacific Rim International Symposium on Dependable Computing (PRDC 2007), 2007
    Co-Authors: Thomas Kottke, Andreas Steininger
    Abstract:

    We propose a reconfigurable Superscalar Processor with two modes of operation: In safety mode the two pipelines run in lock step, executing the same instruction sequence, thus allowing to detect hardware failures. In performance mode different instruction streams are executed in parallel, just like in a standard Superscalar Processor. Considering that many embedded applications comprise a mixture of safety-critical and non safety-critical functions, the ability to dynamically switch between the two modes allows an efficient utilization of the duplicated pipeline. To complement the error detection enabled by the duplicated pipeline, non-duplicated components such as the register file are secured by parity. A systematic failure analysis shows that the proposed implementation can indeed detect all single faults in safety mode and that the ability to switch modes does not compromise the fail safe property. These encouraging results are finally confirmed by extensive fault injection experiments.