Ready Thread

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 18 Experts worldwide ranked by ideXlab platform

Huseini O - One of the best experts on this subject based on the ideXlab platform.

  • SpVM Acceleration with Latency Masking Threads on FPGAs
    eScholarship University of California, 2019
    Co-Authors: Rj Halstead, Wa Najjar, Huseini O
    Abstract:

    Long memory latencies are mitigated through the use of large cache hierarchies in multi-core architectures, SIMD execution in GPU architectures and streaming of data in FPGA-based accelerators. However, none of these approaches benefits irregular applications that exhibit no locality and rely on extensive pointer de-referencing for data accesses. By masking the memory latency, multi-Threaded execution has been demonstrated to deal effectively with such applications. In the MT-FPGA model a multi-Threaded engine is implemented on the FPGA accelerator specifically for the masking on the memory latency in the execution of irregular applications: following a memory access, the execution is switched to a Ready Thread while the suspended Threads wait for the return of the requested data value from memory. The multi-Threaded engine is automatically generated, from C code, by the CHAT compilation tool and is customized to the specific application. In this paper we use the Sparse Vector Matrix application to evaluate the performance of the MT-FPGA execution and compare it to the latest GPU architectures over a wide range of benchmarks

O Huseini - One of the best experts on this subject based on the ideXlab platform.

  • SpVM Acceleration with Latency Masking Threads on FPGAs
    2018
    Co-Authors: Robert J. Halstead, Walid Najjar, O Huseini
    Abstract:

    Author(s): Halstead, RJ; Najjar, WA; Huseini, O | Abstract: Long memory latencies are mitigated through the use of large cache hierarchies in multi-core architectures, SIMD execution in GPU architectures and streaming of data in FPGA-based accelerators. However, none of these approaches benefits irregular applications that exhibit no locality and rely on extensive pointer de-referencing for data accesses. By masking the memory latency, multi-Threaded execution has been demonstrated to deal effectively with such applications. In the MT-FPGA model a multi-Threaded engine is implemented on the FPGA accelerator specifically for the masking on the memory latency in the execution of irregular applications: following a memory access, the execution is switched to a Ready Thread while the suspended Threads wait for the return of the requested data value from memory. The multi-Threaded engine is automatically generated, from C code, by the CHAT compilation tool and is customized to the specific application. In this paper we use the Sparse Vector Matrix application to evaluate the performance of the MT-FPGA execution and compare it to the latest GPU architectures over a wide range of benchmarks.

Rj Halstead - One of the best experts on this subject based on the ideXlab platform.

  • SpVM Acceleration with Latency Masking Threads on FPGAs
    eScholarship University of California, 2019
    Co-Authors: Rj Halstead, Wa Najjar, Huseini O
    Abstract:

    Long memory latencies are mitigated through the use of large cache hierarchies in multi-core architectures, SIMD execution in GPU architectures and streaming of data in FPGA-based accelerators. However, none of these approaches benefits irregular applications that exhibit no locality and rely on extensive pointer de-referencing for data accesses. By masking the memory latency, multi-Threaded execution has been demonstrated to deal effectively with such applications. In the MT-FPGA model a multi-Threaded engine is implemented on the FPGA accelerator specifically for the masking on the memory latency in the execution of irregular applications: following a memory access, the execution is switched to a Ready Thread while the suspended Threads wait for the return of the requested data value from memory. The multi-Threaded engine is automatically generated, from C code, by the CHAT compilation tool and is customized to the specific application. In this paper we use the Sparse Vector Matrix application to evaluate the performance of the MT-FPGA execution and compare it to the latest GPU architectures over a wide range of benchmarks

Robert J. Halstead - One of the best experts on this subject based on the ideXlab platform.

  • SpVM Acceleration with Latency Masking Threads on FPGAs
    2018
    Co-Authors: Robert J. Halstead, Walid Najjar, O Huseini
    Abstract:

    Author(s): Halstead, RJ; Najjar, WA; Huseini, O | Abstract: Long memory latencies are mitigated through the use of large cache hierarchies in multi-core architectures, SIMD execution in GPU architectures and streaming of data in FPGA-based accelerators. However, none of these approaches benefits irregular applications that exhibit no locality and rely on extensive pointer de-referencing for data accesses. By masking the memory latency, multi-Threaded execution has been demonstrated to deal effectively with such applications. In the MT-FPGA model a multi-Threaded engine is implemented on the FPGA accelerator specifically for the masking on the memory latency in the execution of irregular applications: following a memory access, the execution is switched to a Ready Thread while the suspended Threads wait for the return of the requested data value from memory. The multi-Threaded engine is automatically generated, from C code, by the CHAT compilation tool and is customized to the specific application. In this paper we use the Sparse Vector Matrix application to evaluate the performance of the MT-FPGA execution and compare it to the latest GPU architectures over a wide range of benchmarks.

Rassul Ayani - One of the best experts on this subject based on the ideXlab platform.

  • TECHNICAL ARTICLE Modeling and Simulation of MultiThreaded Architectures
    2016
    Co-Authors: Vladimir Vlassov, Rassul Ayani, Lars-erik Thorelli
    Abstract:

    MultiThreaded architectures are widely used for, among other things, hiding long memory latency. In such an architecture, a number of Threads are allocated to each Processing Element (PE), and whenever a running Thread becomes suspended, the PE switches to the next Ready Thread. We have developed a simulation platform, MTASim, that can be used to test and evaluate various policies and parameters of a multiThreaded computer. The most important features of the MTASim are its flexibility and its ease of use. The MTASim model is based on finite state machines and can be easily modified and expanded. The simulation platform includes an experimental planner, an interface to PVM for the execution of independent experiments in parallel, and an interface to Matlab for processing and displaying results. The MTASim has been used to, among other things, determine the optimal number of Threads and to evaluate various prefetching strategies and Thread replacement algorithms

  • Analytical modeling of multiThreaded architectures
    Journal of Systems Architecture, 2000
    Co-Authors: Vladimir Vlassov, Rassul Ayani
    Abstract:

    MultiThreading is used for hiding long memory latency in uniprocessors and multiprocessor computer systems and aims at increasing system efficiency. In such an architecture, a number of Threads are allocated to each processing element (PE) and whenever a running Thread becomes suspended the PE switches to another Ready Thread. In this paper, we discuss analytical modeling of coarsely multiThreaded architectures and present two analytical models: (i) a deterministic model, where the timing parameters (e.g., context switching time, Threads's run length, and memory latency) are assumed to be constant, and (ii) a stochastic model where the timing parameters are random variables. Both models provide a framework to study the dependence of the MTA efficiency on design parameters of the target architecture and its workload. The deterministic model, as well as asymptotic bounding analysis of the stochastic model, allows to determine upper bounds and some break points of the MTA efficiency such as stability (saturation) points, whereas the stochastic model provides more accurate prediction of the efficiency.