Ready Thread

The Experts below are selected from a list of 18 Experts worldwide ranked by ideXlab platform

Huseini O - One of the best experts on this subject based on the ideXlab platform.

SpVM Acceleration with Latency Masking Threads on FPGAs

eScholarship University of California, 2019

Co-Authors: Rj Halstead, Wa Najjar, Huseini O

Abstract:

Long memory latencies are mitigated through the use of large cache hierarchies in multi-core architectures, SIMD execution in GPU architectures and streaming of data in FPGA-based accelerators. However, none of these approaches benefits irregular applications that exhibit no locality and rely on extensive pointer de-referencing for data accesses. By masking the memory latency, multi-Threaded execution has been demonstrated to deal effectively with such applications. In the MT-FPGA model a multi-Threaded engine is implemented on the FPGA accelerator specifically for the masking on the memory latency in the execution of irregular applications: following a memory access, the execution is switched to a Ready Thread while the suspended Threads wait for the return of the requested data value from memory. The multi-Threaded engine is automatically generated, from C code, by the CHAT compilation tool and is customized to the specific application. In this paper we use the Sparse Vector Matrix application to evaluate the performance of the MT-FPGA execution and compare it to the latest GPU architectures over a wide range of benchmarks

15 days free trial to Access Article

O Huseini - One of the best experts on this subject based on the ideXlab platform.

SpVM Acceleration with Latency Masking Threads on FPGAs

2018

Co-Authors: Robert J. Halstead, Walid Najjar, O Huseini

Abstract:

Author(s): Halstead, RJ; Najjar, WA; Huseini, O | Abstract: Long memory latencies are mitigated through the use of large cache hierarchies in multi-core architectures, SIMD execution in GPU architectures and streaming of data in FPGA-based accelerators. However, none of these approaches benefits irregular applications that exhibit no locality and rely on extensive pointer de-referencing for data accesses. By masking the memory latency, multi-Threaded execution has been demonstrated to deal effectively with such applications. In the MT-FPGA model a multi-Threaded engine is implemented on the FPGA accelerator specifically for the masking on the memory latency in the execution of irregular applications: following a memory access, the execution is switched to a Ready Thread while the suspended Threads wait for the return of the requested data value from memory. The multi-Threaded engine is automatically generated, from C code, by the CHAT compilation tool and is customized to the specific application. In this paper we use the Sparse Vector Matrix application to evaluate the performance of the MT-FPGA execution and compare it to the latest GPU architectures over a wide range of benchmarks.

15 days free trial to Access Article

Rj Halstead - One of the best experts on this subject based on the ideXlab platform.

SpVM Acceleration with Latency Masking Threads on FPGAs

eScholarship University of California, 2019

Co-Authors: Rj Halstead, Wa Najjar, Huseini O

Abstract:

Long memory latencies are mitigated through the use of large cache hierarchies in multi-core architectures, SIMD execution in GPU architectures and streaming of data in FPGA-based accelerators. However, none of these approaches benefits irregular applications that exhibit no locality and rely on extensive pointer de-referencing for data accesses. By masking the memory latency, multi-Threaded execution has been demonstrated to deal effectively with such applications. In the MT-FPGA model a multi-Threaded engine is implemented on the FPGA accelerator specifically for the masking on the memory latency in the execution of irregular applications: following a memory access, the execution is switched to a Ready Thread while the suspended Threads wait for the return of the requested data value from memory. The multi-Threaded engine is automatically generated, from C code, by the CHAT compilation tool and is customized to the specific application. In this paper we use the Sparse Vector Matrix application to evaluate the performance of the MT-FPGA execution and compare it to the latest GPU architectures over a wide range of benchmarks

15 days free trial to Access Article

Robert J. Halstead - One of the best experts on this subject based on the ideXlab platform.

SpVM Acceleration with Latency Masking Threads on FPGAs

2018

Co-Authors: Robert J. Halstead, Walid Najjar, O Huseini

Abstract:

Author(s): Halstead, RJ; Najjar, WA; Huseini, O | Abstract: Long memory latencies are mitigated through the use of large cache hierarchies in multi-core architectures, SIMD execution in GPU architectures and streaming of data in FPGA-based accelerators. However, none of these approaches benefits irregular applications that exhibit no locality and rely on extensive pointer de-referencing for data accesses. By masking the memory latency, multi-Threaded execution has been demonstrated to deal effectively with such applications. In the MT-FPGA model a multi-Threaded engine is implemented on the FPGA accelerator specifically for the masking on the memory latency in the execution of irregular applications: following a memory access, the execution is switched to a Ready Thread while the suspended Threads wait for the return of the requested data value from memory. The multi-Threaded engine is automatically generated, from C code, by the CHAT compilation tool and is customized to the specific application. In this paper we use the Sparse Vector Matrix application to evaluate the performance of the MT-FPGA execution and compare it to the latest GPU architectures over a wide range of benchmarks.

15 days free trial to Access Article

Rassul Ayani - One of the best experts on this subject based on the ideXlab platform.

TECHNICAL ARTICLE Modeling and Simulation of MultiThreaded Architectures

2016

Co-Authors: Vladimir Vlassov, Rassul Ayani, Lars-erik Thorelli

Abstract:

MultiThreaded architectures are widely used for, among other things, hiding long memory latency. In such an architecture, a number of Threads are allocated to each Processing Element (PE), and whenever a running Thread becomes suspended, the PE switches to the next Ready Thread. We have developed a simulation platform, MTASim, that can be used to test and evaluate various policies and parameters of a multiThreaded computer. The most important features of the MTASim are its flexibility and its ease of use. The MTASim model is based on finite state machines and can be easily modified and expanded. The simulation platform includes an experimental planner, an interface to PVM for the execution of independent experiments in parallel, and an interface to Matlab for processing and displaying results. The MTASim has been used to, among other things, determine the optimal number of Threads and to evaluate various prefetching strategies and Thread replacement algorithms

15 days free trial to Access Article
Analytical modeling of multiThreaded architectures

Journal of Systems Architecture, 2000

Co-Authors: Vladimir Vlassov, Rassul Ayani

Abstract:

MultiThreading is used for hiding long memory latency in uniprocessors and multiprocessor computer systems and aims at increasing system efficiency. In such an architecture, a number of Threads are allocated to each processing element (PE) and whenever a running Thread becomes suspended the PE switches to another Ready Thread. In this paper, we discuss analytical modeling of coarsely multiThreaded architectures and present two analytical models: (i) a deterministic model, where the timing parameters (e.g., context switching time, Threads's run length, and memory latency) are assumed to be constant, and (ii) a stochastic model where the timing parameters are random variables. Both models provide a framework to study the dependence of the MTA efficiency on design parameters of the target architecture and its workload. The deterministic model, as well as asymptotic bounding analysis of the stochastic model, allows to determine upper bounds and some break points of the MTA efficiency such as stability (saturation) points, whereas the stochastic model provides more accurate prediction of the efficiency.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Huseini O - One of the best experts on this subject based on the ideXlab platform.

SpVM Acceleration with Latency Masking Threads on FPGAs

O Huseini - One of the best experts on this subject based on the ideXlab platform.

SpVM Acceleration with Latency Masking Threads on FPGAs

Rj Halstead - One of the best experts on this subject based on the ideXlab platform.

SpVM Acceleration with Latency Masking Threads on FPGAs

Robert J. Halstead - One of the best experts on this subject based on the ideXlab platform.

SpVM Acceleration with Latency Masking Threads on FPGAs

Rassul Ayani - One of the best experts on this subject based on the ideXlab platform.

TECHNICAL ARTICLE Modeling and Simulation of MultiThreaded Architectures

Analytical modeling of multiThreaded architectures