X86 System

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 114 Experts worldwide ranked by ideXlab platform

Daniel J. Sorin - One of the best experts on this subject based on the ideXlab platform.

  • HPCA - Scalably verifiable dynamic power management
    2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 2014
    Co-Authors: Opeoluwa Matthews, Meng Zhang, Daniel J. Sorin
    Abstract:

    Dynamic power management (DPM) is critical to maximizing the performance of Systems ranging from multicore processors to datacenters. However, one formidable challenge with DPM schemes is verifying that the DPM schemes are correct as the number of computational resources scales up. In this paper, we develop a DPM scheme such that it is scalably verifiable with fully automated formal tools. The key to the design is that the DPM scheme has fractal behavior; that is, it behaves the same at every scale. We show that the fractal design enables scalable formal verification and simulation shows that our scheme does not sacrifice much performance compared to an oracle DPM scheme that optimally allocates power to computational resources. We implement our scheme in a 2-socket 16-core X86 System and experimentally evaluate it.

  • Scalably verifiable dynamic power management
    2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 2014
    Co-Authors: Opeoluwa Matthews, Meng Zhang, Daniel J. Sorin
    Abstract:

    Dynamic power management (DPM) is critical to maximizing the performance of Systems ranging from multicore processors to datacenters. However, one formidable challenge with DPM schemes is verifying that the DPM schemes are correct as the number of computational resources scales up. In this paper, we develop a DPM scheme such that it is scalably verifiable with fully automated formal tools. The key to the design is that the DPM scheme has fractal behavior; that is, it behaves the same at every scale. We show that the fractal design enables scalable formal verification and simulation shows that our scheme does not sacrifice much performance compared to an oracle DPM scheme that optimally allocates power to computational resources. We implement our scheme in a 2-socket 16-core X86 System and experimentally evaluate it.

Philip M. Watts - One of the best experts on this subject based on the ideXlab platform.

  • Coherence based message prediction for optically interconnected chip multiprocessors
    2015 Design Automation & Test in Europe Conference & Exhibition (DATE), 2015
    Co-Authors: Anouk Van Laer, Philip M. Watts, Muhammad Ridwan Madarbux, Chamath Ellawala, Timothy M. Jones
    Abstract:

    Photonic networks on chip have been proposed to reduce latency and power consumption of on-chip communication in chip multiprocessors. However, in switched photonic networks, the path setup latency can create a high overhead, particularly for the short messages generated by shared memory chip multiprocessors (CMP). This has led to proposals for networks which avoid switching using all-to-all or single writer multiple reader (SWMR) networks which dramatically increase optical component counts and hence power consumption. In this work we propose a predictor which uses information from the coherence protocol and previously transmitted messages to predict future messages and hence hide the path setup latency by speculatively setup photonic paths. We show that a directly mapped predictor can achieve prediction hit rates of up to 85% for PARSEC benchmarks in a 16-core X86 System using the MESI coherence protocol whereas a more resource efficient set associative predictor can still achieve prediction rates up to 75%.

  • DATE - Coherence based message prediction for optically interconnected chip multiprocessors
    Design Automation & Test in Europe Conference & Exhibition (DATE) 2015, 2015
    Co-Authors: Anouk Van Laer, Philip M. Watts, Muhammad Ridwan Madarbux, Chamath Ellawala, Timothy M. Jones
    Abstract:

    Photonic networks on chip have been proposed to reduce latency and power consumption of on-chip communication in chip multiprocessors. However, in switched photonic networks, the path setup latency can create a high overhead, particularly for the short messages generated by shared memory chip multiprocessors (CMP). This has led to proposals for networks which avoid switching using all-to-all or single writer multiple reader (SWMR) networks which dramatically increase optical component counts and hence power consumption. In this work we propose a predictor which uses information from the coherence protocol and previously transmitted messages to predict future messages and hence hide the path setup latency by speculatively setup photonic paths. We show that a directly mapped predictor can achieve prediction hit rates of up to 85% for PARSEC benchmarks in a 16-core X86 System using the MESI coherence protocol whereas a more resource efficient set associative predictor can still achieve prediction rates up to 75%.

  • Hot Interconnects - Low Latency Scheduling Algorithm for Shared Memory Communications over Optical Networks
    2013 IEEE 21st Annual Symposium on High-Performance Interconnects, 2013
    Co-Authors: Muhammad Ridwan Madarbux, Anouk Van Laer, Philip M. Watts
    Abstract:

    Optical Network on Chips (NoCs) based on silicon photonics have been proposed to reduce latency and power consumption in future chip multi-core processors (CMP). However, high performance CMPs use a shared memory model which generates large numbers of short messages, typically of the order of 8-256B. Messages of this length create high overhead for optical switching Systems due to arbitration and switching times. Current schemes only start the arbitration process when the message arrives at the input buffer of the network. In this paper, we propose a scheme which intelligently uses the information from the memory controllers to schedule optical paths. We identified predictable patterns of messages associated with memory operations for a 32 core X86 System using the MESI coherency protocol. We used the first message of each pattern to open the optical paths which will be used by all subsequent messages thereby eliminating arbitration time for the latter. Without considering the initial request message, this scheme can therefore reduce the time of flight of a data message in the network by 29% and that of a control message by 67%. We demonstrate the benefits of this scheduling algorithm for applications in the PARSEC benchmark suite with overall average reductions in overhead latency per message, of 31.8% for the stream cluster benchmark and 70.6% for the swaptions benchmark.

  • Low latency scheduling algorithm for shared memory communications over optical networks
    Proceedings - IEEE 21st Annual Symposium on High-Performance Interconnects HOTI 2013, 2013
    Co-Authors: Muhammad Ridwan Madarbux, Anouk Van Laer, Philip M. Watts
    Abstract:

    Optical Network on Chips (NoCs) based on silicon photonics have been proposed to reduce latency and power consumption in future chip multi-core processors (CMP). However, high performance CMPs use a shared memory model which generates large numbers of short messages, typically of the order of 8-256B. Messages of this length create high overhead for optical switching Systems due to arbitration and switching times. Current schemes only start the arbitration process when the message arrives at the input buffer of the network. In this paper, we propose a scheme which intelligently uses the information from the memory controllers to schedule optical paths. We identified predictable patterns of messages associated with memory operations for a 32 core X86 System using the MESI coherency protocol. We used the first message of each pattern to open the optical paths which will be used by all subsequent messages thereby eliminating arbitration time for the latter. Without considering the initial request message, this scheme can therefore reduce the time of flight of a data message in the network by 29% and that of a control message by 67%. We demonstrate the benefits of this scheduling algorithm for applications in the PARSEC benchmark suite with overall average reductions in overhead latency per message, of 31.8% for the stream cluster benchmark and 70.6% for the swaptions benchmark.

Opeoluwa Matthews - One of the best experts on this subject based on the ideXlab platform.

  • HPCA - Scalably verifiable dynamic power management
    2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 2014
    Co-Authors: Opeoluwa Matthews, Meng Zhang, Daniel J. Sorin
    Abstract:

    Dynamic power management (DPM) is critical to maximizing the performance of Systems ranging from multicore processors to datacenters. However, one formidable challenge with DPM schemes is verifying that the DPM schemes are correct as the number of computational resources scales up. In this paper, we develop a DPM scheme such that it is scalably verifiable with fully automated formal tools. The key to the design is that the DPM scheme has fractal behavior; that is, it behaves the same at every scale. We show that the fractal design enables scalable formal verification and simulation shows that our scheme does not sacrifice much performance compared to an oracle DPM scheme that optimally allocates power to computational resources. We implement our scheme in a 2-socket 16-core X86 System and experimentally evaluate it.

  • Scalably verifiable dynamic power management
    2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 2014
    Co-Authors: Opeoluwa Matthews, Meng Zhang, Daniel J. Sorin
    Abstract:

    Dynamic power management (DPM) is critical to maximizing the performance of Systems ranging from multicore processors to datacenters. However, one formidable challenge with DPM schemes is verifying that the DPM schemes are correct as the number of computational resources scales up. In this paper, we develop a DPM scheme such that it is scalably verifiable with fully automated formal tools. The key to the design is that the DPM scheme has fractal behavior; that is, it behaves the same at every scale. We show that the fractal design enables scalable formal verification and simulation shows that our scheme does not sacrifice much performance compared to an oracle DPM scheme that optimally allocates power to computational resources. We implement our scheme in a 2-socket 16-core X86 System and experimentally evaluate it.

Muhammad Ridwan Madarbux - One of the best experts on this subject based on the ideXlab platform.

  • Coherence based message prediction for optically interconnected chip multiprocessors
    2015 Design Automation & Test in Europe Conference & Exhibition (DATE), 2015
    Co-Authors: Anouk Van Laer, Philip M. Watts, Muhammad Ridwan Madarbux, Chamath Ellawala, Timothy M. Jones
    Abstract:

    Photonic networks on chip have been proposed to reduce latency and power consumption of on-chip communication in chip multiprocessors. However, in switched photonic networks, the path setup latency can create a high overhead, particularly for the short messages generated by shared memory chip multiprocessors (CMP). This has led to proposals for networks which avoid switching using all-to-all or single writer multiple reader (SWMR) networks which dramatically increase optical component counts and hence power consumption. In this work we propose a predictor which uses information from the coherence protocol and previously transmitted messages to predict future messages and hence hide the path setup latency by speculatively setup photonic paths. We show that a directly mapped predictor can achieve prediction hit rates of up to 85% for PARSEC benchmarks in a 16-core X86 System using the MESI coherence protocol whereas a more resource efficient set associative predictor can still achieve prediction rates up to 75%.

  • DATE - Coherence based message prediction for optically interconnected chip multiprocessors
    Design Automation & Test in Europe Conference & Exhibition (DATE) 2015, 2015
    Co-Authors: Anouk Van Laer, Philip M. Watts, Muhammad Ridwan Madarbux, Chamath Ellawala, Timothy M. Jones
    Abstract:

    Photonic networks on chip have been proposed to reduce latency and power consumption of on-chip communication in chip multiprocessors. However, in switched photonic networks, the path setup latency can create a high overhead, particularly for the short messages generated by shared memory chip multiprocessors (CMP). This has led to proposals for networks which avoid switching using all-to-all or single writer multiple reader (SWMR) networks which dramatically increase optical component counts and hence power consumption. In this work we propose a predictor which uses information from the coherence protocol and previously transmitted messages to predict future messages and hence hide the path setup latency by speculatively setup photonic paths. We show that a directly mapped predictor can achieve prediction hit rates of up to 85% for PARSEC benchmarks in a 16-core X86 System using the MESI coherence protocol whereas a more resource efficient set associative predictor can still achieve prediction rates up to 75%.

  • Hot Interconnects - Low Latency Scheduling Algorithm for Shared Memory Communications over Optical Networks
    2013 IEEE 21st Annual Symposium on High-Performance Interconnects, 2013
    Co-Authors: Muhammad Ridwan Madarbux, Anouk Van Laer, Philip M. Watts
    Abstract:

    Optical Network on Chips (NoCs) based on silicon photonics have been proposed to reduce latency and power consumption in future chip multi-core processors (CMP). However, high performance CMPs use a shared memory model which generates large numbers of short messages, typically of the order of 8-256B. Messages of this length create high overhead for optical switching Systems due to arbitration and switching times. Current schemes only start the arbitration process when the message arrives at the input buffer of the network. In this paper, we propose a scheme which intelligently uses the information from the memory controllers to schedule optical paths. We identified predictable patterns of messages associated with memory operations for a 32 core X86 System using the MESI coherency protocol. We used the first message of each pattern to open the optical paths which will be used by all subsequent messages thereby eliminating arbitration time for the latter. Without considering the initial request message, this scheme can therefore reduce the time of flight of a data message in the network by 29% and that of a control message by 67%. We demonstrate the benefits of this scheduling algorithm for applications in the PARSEC benchmark suite with overall average reductions in overhead latency per message, of 31.8% for the stream cluster benchmark and 70.6% for the swaptions benchmark.

  • Low latency scheduling algorithm for shared memory communications over optical networks
    Proceedings - IEEE 21st Annual Symposium on High-Performance Interconnects HOTI 2013, 2013
    Co-Authors: Muhammad Ridwan Madarbux, Anouk Van Laer, Philip M. Watts
    Abstract:

    Optical Network on Chips (NoCs) based on silicon photonics have been proposed to reduce latency and power consumption in future chip multi-core processors (CMP). However, high performance CMPs use a shared memory model which generates large numbers of short messages, typically of the order of 8-256B. Messages of this length create high overhead for optical switching Systems due to arbitration and switching times. Current schemes only start the arbitration process when the message arrives at the input buffer of the network. In this paper, we propose a scheme which intelligently uses the information from the memory controllers to schedule optical paths. We identified predictable patterns of messages associated with memory operations for a 32 core X86 System using the MESI coherency protocol. We used the first message of each pattern to open the optical paths which will be used by all subsequent messages thereby eliminating arbitration time for the latter. Without considering the initial request message, this scheme can therefore reduce the time of flight of a data message in the network by 29% and that of a control message by 67%. We demonstrate the benefits of this scheduling algorithm for applications in the PARSEC benchmark suite with overall average reductions in overhead latency per message, of 31.8% for the stream cluster benchmark and 70.6% for the swaptions benchmark.

Meng Zhang - One of the best experts on this subject based on the ideXlab platform.

  • HPCA - Scalably verifiable dynamic power management
    2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 2014
    Co-Authors: Opeoluwa Matthews, Meng Zhang, Daniel J. Sorin
    Abstract:

    Dynamic power management (DPM) is critical to maximizing the performance of Systems ranging from multicore processors to datacenters. However, one formidable challenge with DPM schemes is verifying that the DPM schemes are correct as the number of computational resources scales up. In this paper, we develop a DPM scheme such that it is scalably verifiable with fully automated formal tools. The key to the design is that the DPM scheme has fractal behavior; that is, it behaves the same at every scale. We show that the fractal design enables scalable formal verification and simulation shows that our scheme does not sacrifice much performance compared to an oracle DPM scheme that optimally allocates power to computational resources. We implement our scheme in a 2-socket 16-core X86 System and experimentally evaluate it.

  • Scalably verifiable dynamic power management
    2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 2014
    Co-Authors: Opeoluwa Matthews, Meng Zhang, Daniel J. Sorin
    Abstract:

    Dynamic power management (DPM) is critical to maximizing the performance of Systems ranging from multicore processors to datacenters. However, one formidable challenge with DPM schemes is verifying that the DPM schemes are correct as the number of computational resources scales up. In this paper, we develop a DPM scheme such that it is scalably verifiable with fully automated formal tools. The key to the design is that the DPM scheme has fractal behavior; that is, it behaves the same at every scale. We show that the fractal design enables scalable formal verification and simulation shows that our scheme does not sacrifice much performance compared to an oracle DPM scheme that optimally allocates power to computational resources. We implement our scheme in a 2-socket 16-core X86 System and experimentally evaluate it.