Hardware Thread

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 9339 Experts worldwide ranked by ideXlab platform

Andreas Koch - One of the best experts on this subject based on the ideXlab platform.

  • an open source tool flow for the composition of reconfigurable Hardware Thread pool architectures
    Field-Programmable Custom Computing Machines, 2015
    Co-Authors: Jens Korinth, David De La Chevallerie, Andreas Koch
    Abstract:

    With heterogeneous parallel computing becoming more accessible from general-purpose languages, such as directive-enhanced C/C++ or X10, it is now profitable to exploit the highly energy-efficient operation of reconfigurable accelerators in such frameworks. A common paradigm to present the accelerator to the programmer is as a pool of individual Threads, each executed on dedicated Hardware. While the actual accelerator logic can be synthesized into IP cores from a high-level language using tools such as Vivado HLS, no tools currently exist to automatically compose multiple heterogeneous accelerator cores into a unified Hardware Thread pool, including the assembly of external control and memory interfaces. Thread Pool Composer closes the gap in the design flow between high-level synthesis and general-purpose IP integration by automatically composing Hardware Thread pools and their external interfaces from high-level descriptions and opening them to software using a common API.

  • FCCM - An Open-Source Tool Flow for the Composition of Reconfigurable Hardware Thread Pool Architectures
    2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, 2015
    Co-Authors: Jens Korinth, David De La Chevallerie, Andreas Koch
    Abstract:

    With heterogeneous parallel computing becoming more accessible from general-purpose languages, such as directive-enhanced C/C++ or X10, it is now profitable to exploit the highly energy-efficient operation of reconfigurable accelerators in such frameworks. A common paradigm to present the accelerator to the programmer is as a pool of individual Threads, each executed on dedicated Hardware. While the actual accelerator logic can be synthesized into IP cores from a high-level language using tools such as Vivado HLS, no tools currently exist to automatically compose multiple heterogeneous accelerator cores into a unified Hardware Thread pool, including the assembly of external control and memory interfaces. Thread Pool Composer closes the gap in the design flow between high-level synthesis and general-purpose IP integration by automatically composing Hardware Thread pools and their external interfaces from high-level descriptions and opening them to software using a common API.

Jens Korinth - One of the best experts on this subject based on the ideXlab platform.

  • an open source tool flow for the composition of reconfigurable Hardware Thread pool architectures
    Field-Programmable Custom Computing Machines, 2015
    Co-Authors: Jens Korinth, David De La Chevallerie, Andreas Koch
    Abstract:

    With heterogeneous parallel computing becoming more accessible from general-purpose languages, such as directive-enhanced C/C++ or X10, it is now profitable to exploit the highly energy-efficient operation of reconfigurable accelerators in such frameworks. A common paradigm to present the accelerator to the programmer is as a pool of individual Threads, each executed on dedicated Hardware. While the actual accelerator logic can be synthesized into IP cores from a high-level language using tools such as Vivado HLS, no tools currently exist to automatically compose multiple heterogeneous accelerator cores into a unified Hardware Thread pool, including the assembly of external control and memory interfaces. Thread Pool Composer closes the gap in the design flow between high-level synthesis and general-purpose IP integration by automatically composing Hardware Thread pools and their external interfaces from high-level descriptions and opening them to software using a common API.

  • FCCM - An Open-Source Tool Flow for the Composition of Reconfigurable Hardware Thread Pool Architectures
    2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, 2015
    Co-Authors: Jens Korinth, David De La Chevallerie, Andreas Koch
    Abstract:

    With heterogeneous parallel computing becoming more accessible from general-purpose languages, such as directive-enhanced C/C++ or X10, it is now profitable to exploit the highly energy-efficient operation of reconfigurable accelerators in such frameworks. A common paradigm to present the accelerator to the programmer is as a pool of individual Threads, each executed on dedicated Hardware. While the actual accelerator logic can be synthesized into IP cores from a high-level language using tools such as Vivado HLS, no tools currently exist to automatically compose multiple heterogeneous accelerator cores into a unified Hardware Thread pool, including the assembly of external control and memory interfaces. Thread Pool Composer closes the gap in the design flow between high-level synthesis and general-purpose IP integration by automatically composing Hardware Thread pools and their external interfaces from high-level descriptions and opening them to software using a common API.

David De La Chevallerie - One of the best experts on this subject based on the ideXlab platform.

  • an open source tool flow for the composition of reconfigurable Hardware Thread pool architectures
    Field-Programmable Custom Computing Machines, 2015
    Co-Authors: Jens Korinth, David De La Chevallerie, Andreas Koch
    Abstract:

    With heterogeneous parallel computing becoming more accessible from general-purpose languages, such as directive-enhanced C/C++ or X10, it is now profitable to exploit the highly energy-efficient operation of reconfigurable accelerators in such frameworks. A common paradigm to present the accelerator to the programmer is as a pool of individual Threads, each executed on dedicated Hardware. While the actual accelerator logic can be synthesized into IP cores from a high-level language using tools such as Vivado HLS, no tools currently exist to automatically compose multiple heterogeneous accelerator cores into a unified Hardware Thread pool, including the assembly of external control and memory interfaces. Thread Pool Composer closes the gap in the design flow between high-level synthesis and general-purpose IP integration by automatically composing Hardware Thread pools and their external interfaces from high-level descriptions and opening them to software using a common API.

  • FCCM - An Open-Source Tool Flow for the Composition of Reconfigurable Hardware Thread Pool Architectures
    2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, 2015
    Co-Authors: Jens Korinth, David De La Chevallerie, Andreas Koch
    Abstract:

    With heterogeneous parallel computing becoming more accessible from general-purpose languages, such as directive-enhanced C/C++ or X10, it is now profitable to exploit the highly energy-efficient operation of reconfigurable accelerators in such frameworks. A common paradigm to present the accelerator to the programmer is as a pool of individual Threads, each executed on dedicated Hardware. While the actual accelerator logic can be synthesized into IP cores from a high-level language using tools such as Vivado HLS, no tools currently exist to automatically compose multiple heterogeneous accelerator cores into a unified Hardware Thread pool, including the assembly of external control and memory interfaces. Thread Pool Composer closes the gap in the design flow between high-level synthesis and general-purpose IP integration by automatically composing Hardware Thread pools and their external interfaces from high-level descriptions and opening them to software using a common API.

J. Stevens - One of the best experts on this subject based on the ideXlab platform.

  • FPL - Supporting High Level Language Semantics within Hardware Resident Threads
    2007 International Conference on Field Programmable Logic and Applications, 2007
    Co-Authors: E. Anderson, W. Peck, J. Stevens, Jason Agron, F. Baijot, S. Warn, David Andrews
    Abstract:

    The paper presents the new Hardware Thread interface (HWTI), a meaningful and semantic rich target for a high level language to Hardware descriptive language translator. The HWTI provides a Hardware Thread with the same Thread system calls available to software Threads, a fast global distributed memory, support for pointers, a generalized function call model including recursion, local variable declaration, dynamic memory allocation, and a remote procedural call model that enables Hardware Threads access to any library function.

  • FPL - HybridThreads Compiler: Generation of Application Specific Hardware Thread Cores from C
    2007 International Conference on Field Programmable Logic and Applications, 2007
    Co-Authors: J. Stevens
    Abstract:

    The hThreads group is developing the hybridThreads compiler (HTC) to satisfy the need for a C compiler that can generate Hardware Threads. Compiling C-like languages to Hardware has been studied a number of times. The goal of past projects is different than the goal of HTC because past projects focused on creating and optimizing Hardware based co-processors that interleave execution of a single Thread with the CPU. Our goal is to compile unmodified C into Hardware Threads that can run without depending on the CPU, gaining our speedup from physical Thread-level parallelism (TLP). We are not attempting to use C as a general-purpose Hardware description language. Instead, we use the FPGA with hThreads as just another compiler target architecture.

  • ERSA - Memory Hierarchy for MCSoPC MultiThreaded Systems.
    2007
    Co-Authors: E. Anderson, W. Peck, J. Stevens, Jason Agron, F. Baijot, Seth Warn, David Andrews
    Abstract:

    ABSTRACTThanks to advancements in fabrication techniques, it willsoon be possible to place 10’s if not 100’s of cores on asingle hybrid CPU/FPGA reconfigurable chip. This haslead to a new field of study, namely Multi-Core Systemson a Programmable Chip (MCSoPC). The problems beingstudied with MCSoPC are not unlike the problems stud-ied 20 years ago when multiple CPU parallel processingnetworks were first organized to solve computationally in-tensive tasks. With reconfigurable computing the prob-lems are more complex as Hardware/software co-design en-gineers would like to create hybrid hard, soft, and customcore solutions, all specified from within a software cen-tric programming model. This paper introduces a new dis-tributed shared memory model that enables the existing Hy-bridThread’s Hardware Thread Interface (HWTI) to providea function call stack and heap for Hardware Threads. Fea-tures that in turn provide a generalized model for invokingfunctions, support for recursion, and dynamic memory allo-cation. These services were added to the HWTI not onlyto give Hardware Threads better performance, but providea meaningful and semantic rich target for a HLL to HDLtranslator.1. INTRODUCTIONThe potential benefits of Multi-Core System on a Pro-grammable Chip (MCSoPC) are clear: increased perfor-mance, customizations, and parallelization of tasks. How-ever,thesebenefitsdonotcomewithoutchallenges. InshortMCSoPC are difficult to program. Reasons include unre-alized performance gains, immature programming models[10], undefined targets for Hardware tasks, and indetermi-natecommunicationmodelsbetweenHardwareandsoftwaretasks. Ideally programmers would like to specify their MC-SoPC in a known high level language and programmingmodel, and have their design automatically and correctlytranslated to hybrid software and Hardware tasks. A numberof research groups have made impressive strides in achiev-ingthisgoal[5,6,16,11,8,14],howevernonethusfarhavebeen able to achieve unaltered high level language to hard-

  • FCCM - Enabling a Uniform Programming Model Across the Software/Hardware Boundary
    2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2006
    Co-Authors: E. Anderson, W. Peck, J. Stevens, Jason Agron, F. Baijot, Ed Komp, Ron Sass, David Andrews
    Abstract:

    In this paper, we present hThreads, a unifying programming model for specifying application Threads running within a hybrid CPU/FPGA system. Threads are specified from a single pThreads multiThreaded application program and compiled to run on the CPU or synthesized to run on the FPGA. The hThreads system, in general, is unique within the reconfigurable computing community as it abstracts the CPU/FPGA components into a unified custom Threaded multiprocessor architecture platform. To support the abstraction of the CPU/FPGA component boundary, we have created the Hardware Thread interface (HWTI) component that frees the designer from having to specify and embed platform specific instructions to form customized Hardware/ software interactions. Instead, the Hardware Thread interface supports the generalized pThreads API semantics, and allows passing of abstract data types between Hardware and software Threads. Thus the Hardware Thread interface provides an abstract, platform independent compilation target that enables Thread and instruction-level parallelism across the software/Hardware boundary.

Andreas Herkersdorf - One of the best experts on this subject based on the ideXlab platform.

  • tcu a multi objective Hardware Thread mapping unit for hpc clusters
    IEEE International Conference on High Performance Computing Data and Analytics, 2016
    Co-Authors: Ravi Kumar Pujari, Thomas Wild, Andreas Herkersdorf
    Abstract:

    Meeting multiple, partially orthogonal optimization targets during Thread scheduling on HPC and manycore platforms simultaneously, like maximizing CPU performance, meeting deadlines of time critical tasks, minimizing power and securing thermal resilience, is a major challenge because of associated scalability and Thread management overhead. We tackle these challenges by introducing the Thread Control Unit (TCU), a configurable, low-latency, low-overhead Hardware Thread mapper in compute nodes of an HPC cluster. The TCU takes various sensor information into account and can map Threads to 4–16 CPUs of a compute node within a small and bounded number of clock cycles in round-robin, single- or multi-objective manner. The TCU design can consider not just load balancing or performance criteria but also physical constraints like temperature limits, power budgets and reliability aspects. Evaluations of different mapping policies show that multi-objective Thread mapping provides about 10 to 40 % less mapping latency for periodic workloads compared to single-objective or round-robin policies. For bursty workloads under high load conditions, a 20 % reduction is achieved.

  • ISC - TCU: A Multi-Objective Hardware Thread Mapping Unit for HPC Clusters
    Lecture Notes in Computer Science, 2016
    Co-Authors: Ravi Kumar Pujari, Thomas Wild, Andreas Herkersdorf
    Abstract:

    Meeting multiple, partially orthogonal optimization targets during Thread scheduling on HPC and manycore platforms simultaneously, like maximizing CPU performance, meeting deadlines of time critical tasks, minimizing power and securing thermal resilience, is a major challenge because of associated scalability and Thread management overhead. We tackle these challenges by introducing the Thread Control Unit (TCU), a configurable, low-latency, low-overhead Hardware Thread mapper in compute nodes of an HPC cluster. The TCU takes various sensor information into account and can map Threads to 4–16 CPUs of a compute node within a small and bounded number of clock cycles in round-robin, single- or multi-objective manner. The TCU design can consider not just load balancing or performance criteria but also physical constraints like temperature limits, power budgets and reliability aspects. Evaluations of different mapping policies show that multi-objective Thread mapping provides about 10 to 40 % less mapping latency for periodic workloads compared to single-objective or round-robin policies. For bursty workloads under high load conditions, a 20 % reduction is achieved.

  • a Hardware based multi objective Thread mapper for tiled manycore architectures
    International Conference on Computer Design, 2015
    Co-Authors: Ravi Kumar Pujari, Thomas Wild, Andreas Herkersdorf
    Abstract:

    Thread mapping is typically performed as an integral part of cooperative or pre-emptive operating system (OS) scheduling in order to share the processor core(s) among competing applications. Schedulers usually follow a single-objective performance optimization, such as maximizing core utilization or satisfying deadlines by the prioritization of Threads. Meeting multiple orthogonal objectives, like performance vs. power or thermal resilience, in the era of manycore processors is a challenge because of the associated scalability and Thread management overhead. We tackle these challenges by employing a two stage Thread management strategy. In the first stage (not covered in this short paper), Threads are assigned to regions or compute tiles. For the second stage we introduce in this paper the TCU (Thread Control Unit), a configurable, low latency, low overhead Hardware Thread mapper that takes various runtime sensor parameters into account. It can map Threads within a small and bounded number of clock cycles in round robin, single or multi-objective manner. TCU is designed to consider not just load balancing or performance criteria but also physical constraints like power budgets, temperature limits and reliability aspects. TCU macro achieves 150K Thread mappings per second on a tiled MPSoC FPGA prototype while operating at moderate 50 Mz. Evaluations of different mapping policies show that multi-objective Thread mapping provides about 10 to 40% less mapping latency for periodic and bursty traffic compared to single-objective or round robin schemes. FPGA and ASIC syntheses reveal a 9% Hardware overhead for the TCU on a four core compute tile.