Out-of-Order Execution

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 793908 Experts worldwide ranked by ideXlab platform

Tamarah Arons - One of the best experts on this subject based on the ideXlab platform.

  • Verification of an advanced MIPS-type Out-of-Order Execution algorithm
    Lecture Notes in Computer Science, 2004
    Co-Authors: Tamarah Arons
    Abstract:

    In this paper we propose a method for the deductive verification of Out-of-Order scheduling algorithms. We use TLPVS, our PVs model of linear temporal logic (LTL), to deductively verify the correctness of a model based on the Mips R10000 design. Our proofs use the predicted values method to verify a system including arithmetic and memory operations and speculation. In addition to the abstraction refinement traditionally used to verify safety properties, we also use fairness constraints to prove progress, allowing us to detect errors which may otherwise be overlooked.

  • CAV - Verification of an Advanced mips-Type Out-of-Order Execution Algorithm
    Computer Aided Verification, 2004
    Co-Authors: Tamarah Arons
    Abstract:

    In this paper we propose a method for the deductive verification of Out-of-Order scheduling algorithms. We use tlpvs, our pvs model of linear temporal logic (ltl), to deductively verify the correctness of a model based on the Mips R10000 design. Our proofs use the predicted values method to verify a system including arithmetic and memory operations and speculation. In addition to the abstraction refinement traditionally used to verify safety properties, we also use fairness constraints to prove progress, allowing us to detect errors which may otherwise be overlooked.

  • A Methodology for Deductive Verification of Out-of-Order Execution Systems Based on Predicted Values,
    2001
    Co-Authors: Tamarah Arons, Amir Pnueli
    Abstract:

    In this paper we propose a methodology for the deductive verification of Out-of-Order scheduling algorithms. A `top-down'' scheme for the systematic definition of system invariants is defined. The complementary use of predicted values, auxiliary fields storing a dispatch time prediction of an instruction''s value, is proposed as a means of further simplifying the verification of systems in this class. We illustrate the use of the `top-down'' methodology and predicted values in the verification of three Out-of-Order scheduling algorithms, including a detailed discussion of the verification of a model based on the Mips R10000.

  • VLSI Design - Verifying Tomasulo's algorithm by refinement
    Proceedings Twelfth International Conference on VLSI Design. (Cat. No.PR00013), 1999
    Co-Authors: Tamarah Arons, Amir Pnueli
    Abstract:

    In this paper Tomasulo's algorithm for Out-of-Order Execution is shown to be a refinement of the sequential instruction Execution algorithm. Correctness of Tomasulo's algorithm is established by proving that the register files of Tomasulo's algorithm and the sequential algorithm agree once all instructions have been completed.

  • FMCAD - Verification of Data-Insensitive CIrcuits: An In-Order-Retirement Case Study
    Formal Methods in Computer-Aided Design, 1998
    Co-Authors: Amir Pnueli, Tamarah Arons
    Abstract:

    There is a large class of circuits (including pipeline and Out-of-Order Execution components) which can be formally verified while completely ignoring the precise characteristics (e.g. word-size) of the data manipulated by the circuits. In the literature, this is often described as the use of uninterpreted functions, implying that the concrete operations applied to the data are abstracted into unknown and featureless functions. In this paper, we briefly introduce an abstract unifying model for such data-insensitive circuits, and claim that the development of such models, perhaps even a theory of circuit schemas, can significantly contribute to the development of efficient and comprehensive verification algorithms combining deductive as well as enumerative methods.As a case study, we present in this paper an algorithm for Out-of-Order Execution with in-order retirement and show it to be a refinement of the sequential instruction Execution algorithm. Refinement is established by deductively proving (using pvs) that the register files of the Out-of-Order algorithm and the sequential algorithm agree at all times if the two systems are synchronized at instruction retirement time.

Thomas M Conte - One of the best experts on this subject based on the ideXlab platform.

  • a fast interrupt handling scheme for vliw processors
    International Conference on Parallel Architectures and Compilation Techniques, 1998
    Co-Authors: Emre Ozer, Sumedh W Sathaye, Kishore N Menezes, Sanjeev Banerjia, Matthew D Jennings, Thomas M Conte
    Abstract:

    Interrupt handling in Out-of-Order Execution processors requires complex hardware schemes to maintain the sequential state. The amount of hardware will be substantial in VLIW architectures due to the nature of issuing a very large number of instructions in each cycle. It is hard to implement precise interrupts in Out-of-Order Execution machines, especially in VLIW processors. In this paper, we will apply the reorder buffer with future file and the history buffer methods to a VLIW platform, and present a novel scheme, called the current-state buffer, which employs modest hardware with compiler support. Unlike the other interrupt handling schemes, the current-state buffer does not keep history state, result buffering or bypass mechanisms. It is a fast interrupt handling scheme with a relatively small buffer that records the Execution and exception status of operations. It is suitable for embedded processors that require a fast interrupt handling mechanism with modest hardware.

  • IEEE PACT - A fast interrupt handling scheme for VLIW processors
    Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192), 1998
    Co-Authors: Emre Ozer, Sumedh W Sathaye, Kishore N Menezes, Sanjeev Banerjia, Matthew D Jennings, Thomas M Conte
    Abstract:

    Interrupt handling in Out-of-Order Execution processors requires complex hardware schemes to maintain the sequential state. The amount of hardware will be substantial in VLIW architectures due to the nature of issuing a very large number of instructions in each cycle. It is hard to implement precise interrupts in Out-of-Order Execution machines, especially in VLIW processors. In this paper, we will apply the reorder buffer with future file and the history buffer methods to a VLIW platform, and present a novel scheme, called the current-state buffer, which employs modest hardware with compiler support. Unlike the other interrupt handling schemes, the current-state buffer does not keep history state, result buffering or bypass mechanisms. It is a fast interrupt handling scheme with a relatively small buffer that records the Execution and exception status of operations. It is suitable for embedded processors that require a fast interrupt handling mechanism with modest hardware.

David Blaauw - One of the best experts on this subject based on the ideXlab platform.

  • a second generation sensor network processor with application driven memory optimizations and out of order Execution
    Compilers Architecture and Synthesis for Embedded Systems, 2005
    Co-Authors: Leyla Nazhandali, M Minuth, Bo Zhai, J Olson, Todd Austin, David Blaauw
    Abstract:

    In this paper we present a second-generation sensor network processor which consumes less than one picoJoule per instruction (typical processors use 100's to 1000's of picoJoules per instruction). As in our first-generation design effort, we strive to build microarchitectures that minimize area to reduce leakage, maximize transistor utility to reduce the energy-optimal voltage, and optimize CPI for efficient processing. The new design builds on our previous work to develop a low-power subthreshold-voltage sensor processor, this time improving the design by focusing on ISA, memory system design, and microarchitectural optimizations that reduce overall design size and improve energy-per-instruction. The new design employs 8-bit datapaths and an ultra-compact 12-bit wide RISC instruction set architecture, which enables high code density via micro-operations and flexible operand modes. The design also features a unique memory architecture with prefetch buffer and predecoded address bits, which permits both faster access to the memory and smaller instructions due to few address bits. To achieve efficient processing, the design incorporates branch speculation and Out-of-Order Execution, but in a simplified form for reduced area and leakage-energy overheads. Using SPICE-level timing and power simulation, we find that these optimizations produce a number of Pareto-optimal designs with varied performance-energy tradeoffs. Our most efficient design is capable of running at 142 kHz (0.1 MIPS) while consuming only 600 fJ/instruction, allowing the processor to run continuously for 41 years on the energy stored in a miniature 1g lithium-ion battery. Work is ongoing to incorporate this design into an intra-ocular pressure sensor.

  • CASES - A second-generation sensor network processor with application-driven memory optimizations and Out-of-Order Execution
    Proceedings of the 2005 international conference on Compilers architectures and synthesis for embedded systems - CASES '05, 2005
    Co-Authors: Leyla Nazhandali, M Minuth, Bo Zhai, J Olson, Todd Austin, David Blaauw
    Abstract:

    In this paper we present a second-generation sensor network processor which consumes less than one picoJoule per instruction (typical processors use 100's to 1000's of picoJoules per instruction). As in our first-generation design effort, we strive to build microarchitectures that minimize area to reduce leakage, maximize transistor utility to reduce the energy-optimal voltage, and optimize CPI for efficient processing. The new design builds on our previous work to develop a low-power subthreshold-voltage sensor processor, this time improving the design by focusing on ISA, memory system design, and microarchitectural optimizations that reduce overall design size and improve energy-per-instruction. The new design employs 8-bit datapaths and an ultra-compact 12-bit wide RISC instruction set architecture, which enables high code density via micro-operations and flexible operand modes. The design also features a unique memory architecture with prefetch buffer and predecoded address bits, which permits both faster access to the memory and smaller instructions due to few address bits. To achieve efficient processing, the design incorporates branch speculation and Out-of-Order Execution, but in a simplified form for reduced area and leakage-energy overheads. Using SPICE-level timing and power simulation, we find that these optimizations produce a number of Pareto-optimal designs with varied performance-energy tradeoffs. Our most efficient design is capable of running at 142 kHz (0.1 MIPS) while consuming only 600 fJ/instruction, allowing the processor to run continuously for 41 years on the energy stored in a miniature 1g lithium-ion battery. Work is ongoing to incorporate this design into an intra-ocular pressure sensor.

David L Dill - One of the best experts on this subject based on the ideXlab platform.

  • Formal Verification of Out-of-Order Execution with Incremental Flushing
    Formal Methods in System Design, 2002
    Co-Authors: Robert B. Jones, Jens U. Skakkebæk, David L Dill
    Abstract:

    We present an approach for formally verifying that a high-level microprocessor model behaves as defined by an instruction-set architecture. The technique is based on a specialization of self consistency called incremental flushing and reduces the need and effort required to create manually-generated implementation abstractions. Additionally, incremental flushing reduces the computational complexity of the proof obligations generated when reasoning about Out-of-Order Execution. This is accomplished by comparing the functional behavior of the implementation abstraction over two sets of inputs: one that represents normal operation and one that is simpler, but functionally equivalent. The approach is illustrated on a simple Out-of-Order microprocessor core.

  • Applications of symbolic simulation to the formal verification of microprocessors
    1999
    Co-Authors: David L Dill, Robert B. Jones
    Abstract:

    Functional validation is a major problem in the design of complex hardware systems. This is especially true in microprocessor design, where validation consumes a continually increasing percentage of design resources. Formal verification is often proposed as a solution to this problem, but current techniques have had only limited impact because of scaling problems; the complexity of designs is increasing faster than the capacity of formal verification techniques. This dissertation presents new ideas that help close the gap between design complexity and formal verification capacity. Two ideas in symbolic simulation greatly expand the applicability of formal verification to commercial design, even in the short term. These ideas have been successfully applied during the design process to find bugs in large submodules of Intel microprocessors. In the longer term, greater strides can be achieved by moving to design practices that embrace higher-level behavioral descriptions. This work develops a technique for verifying Out-of-Order Execution on such descriptions. Part I describes self consistency, an approach to specification that allows formal verification to be performed without manually creating a formal specification. A reference specification is derived by symbolically simulating the circuit operating with altered inputs or in a different mode. We describe the results of using reference specifications in the verification of two circuits from Intel microprocessor designs. Part II describes the use of parametric representations of Boolean predicates to control state explosion in BDD-based symbolic simulation. A parametric representation encodes a Boolean predicate as a functional vector. This is useful for constraining verification to a care set and decomposing the care set by data-space partitioning. This technique is much simpler than the more standard structural decomposition approach. Verification results are illustrated for two additional circuits from Intel microprocessor designs. Part III describes incremental flushing, a technique for verifying Out-of-Order Execution. Out-of-Order Execution is a difficult verification problem because of the large effective pipeline depth. Incremental flushing is applied to the verification of an Out-of-Order Execution core. An extension to incremental flushing is developed that reduces the amount of manual abstraction required for verification.

  • Formal verification of Out-of-Order Execution using incremental flushing
    Lecture Notes in Computer Science, 1998
    Co-Authors: Jens U. Skakkebæk, Robert B. Jones, David L Dill
    Abstract:

    We present a two-part approach for verifying Out-of-Order Execution. First, the complexity of Out-of-Order issue and scheduling is handled by creating an in-order abstraction of the Out-of-Order Execution core. Second, incremental flushing addresses the complexity difficulties encountered by automated abstraction functions on very deep pipelines. We illustrate the techniques on a model of a simple Out-of-Order processor core.

  • CAV - Formal Verification of Out-of-Order Execution Using Incremental Flushing
    Computer Aided Verification, 1998
    Co-Authors: Jens U. Skakkebæk, Robert B. Jones, David L Dill
    Abstract:

    We present a two-part approach for verifying Out-of-Order Execution. First, the complexity of Out-of-Order issue and scheduling is handled by creating an in-order abstraction of the Out-of-Order Execution core. Second, incremental flushing addresses the complexity difficulties encountered by automated abstraction functions on very deep pipelines. We illustrate the techniques on a model of a simple Out-of-Order processor core.

  • FMCAD - Reducing Manual Abstraction in Formal Verification of Out-of-Order Execution
    Formal Methods in Computer-Aided Design, 1998
    Co-Authors: Robert B. Jones, Jens U. Skakkebæk, David L Dill
    Abstract:

    Several methods have recently been proposed for verifying processors with Out-of-Order Execution. These methods use intermediate abstractions to decompose the verification process into smaller steps. Unfortunately, the process of manually creating intermediate abstractions is very laborious. We present an approach that dramatically reduces the need for an intermediate abstraction, so that only the scheduling logic of the implementation is abstracted. After the abstraction, we apply an enhanced incremental-flushing approach to verify the remaining circuitry by comparing the processor description against itself in a slightly simpler configuration. By induction, we demonstrate that any reachable configuration is equivalent to the simplest possible configuration. Finally, we prove correctness on the simplest configuration. The approach is illustrated with a simple example of an Out-of-Order Execution core.

Emre Ozer - One of the best experts on this subject based on the ideXlab platform.

  • a fast interrupt handling scheme for vliw processors
    International Conference on Parallel Architectures and Compilation Techniques, 1998
    Co-Authors: Emre Ozer, Sumedh W Sathaye, Kishore N Menezes, Sanjeev Banerjia, Matthew D Jennings, Thomas M Conte
    Abstract:

    Interrupt handling in Out-of-Order Execution processors requires complex hardware schemes to maintain the sequential state. The amount of hardware will be substantial in VLIW architectures due to the nature of issuing a very large number of instructions in each cycle. It is hard to implement precise interrupts in Out-of-Order Execution machines, especially in VLIW processors. In this paper, we will apply the reorder buffer with future file and the history buffer methods to a VLIW platform, and present a novel scheme, called the current-state buffer, which employs modest hardware with compiler support. Unlike the other interrupt handling schemes, the current-state buffer does not keep history state, result buffering or bypass mechanisms. It is a fast interrupt handling scheme with a relatively small buffer that records the Execution and exception status of operations. It is suitable for embedded processors that require a fast interrupt handling mechanism with modest hardware.

  • IEEE PACT - A fast interrupt handling scheme for VLIW processors
    Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192), 1998
    Co-Authors: Emre Ozer, Sumedh W Sathaye, Kishore N Menezes, Sanjeev Banerjia, Matthew D Jennings, Thomas M Conte
    Abstract:

    Interrupt handling in Out-of-Order Execution processors requires complex hardware schemes to maintain the sequential state. The amount of hardware will be substantial in VLIW architectures due to the nature of issuing a very large number of instructions in each cycle. It is hard to implement precise interrupts in Out-of-Order Execution machines, especially in VLIW processors. In this paper, we will apply the reorder buffer with future file and the history buffer methods to a VLIW platform, and present a novel scheme, called the current-state buffer, which employs modest hardware with compiler support. Unlike the other interrupt handling schemes, the current-state buffer does not keep history state, result buffering or bypass mechanisms. It is a fast interrupt handling scheme with a relatively small buffer that records the Execution and exception status of operations. It is suitable for embedded processors that require a fast interrupt handling mechanism with modest hardware.