OpenCL

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 10320 Experts worldwide ranked by ideXlab platform

Jaejin Lee - One of the best experts on this subject based on the ideXlab platform.

  • ISCA - SOFF: An OpenCL High-Level Synthesis Framework for FPGAs
    2020 ACM IEEE 47th Annual International Symposium on Computer Architecture (ISCA), 2020
    Co-Authors: Heehoon Kim, Jee-soo Lee, Jaejin Lee
    Abstract:

    Recently, OpenCL has been emerging as a programming model for energy-efficient FPGA accelerators. However, the state-of-the-art OpenCL frameworks for FPGAs suffer from poor performance and usability. This paper proposes a high-level synthesis framework of OpenCL for FPGAs, called SOFF. It automatically synthesizes a datapath to execute many OpenCL kernel threads in a pipelined manner. It also synthesizes an efficient memory subsystem for the datapath based on the characteristics of OpenCL kernels. Unlike previous high-level synthesis techniques, we propose a formal way to handle variable latency instructions, complex control flows, OpenCL barriers, and atomic operations that appear in real-world OpenCL kernels. SOFF is the first OpenCL framework that correctly compiles and executes all applications in the SPEC ACCEL benchmark suite except three applications that require more FPGA resources than are available. In addition, SOFF achieves the speedup of 1.33 over Intel FPGA SDK for OpenCL without any explicit user annotation or source code modification.

  • SnuCL: A unified OpenCL framework for heterogeneous clusters
    Advances in GPU Research and Practice, 2017
    Co-Authors: Jaejin Lee, Woo-kyun Jung, Hyunyong Kim, J. Kim, Yun Jong Lee, J.-g. Park
    Abstract:

    Open Computing Language (OpenCL) is a programming model for heterogeneous parallel computing systems. OpenCL provides a common abstraction layer across general-purpose CPUs and different types of accelerators. Programmers write an OpenCL application once and then can run it on any OpenCL-compliant system. However, to target a heterogeneous cluster, programmers must use OpenCL in combination with a communication library. This chapter introduces SnuCL, a freely available, open-source OpenCL framework for heterogeneous clusters. SnuCL provides the programmer with an illusion of a single, unified OpenCL platform image for the cluster. SnuCL allows the OpenCL application to utilize compute devices in a compute node as though they were in the host node. In addition, SnuCL integrates multiple OpenCL platforms from different vendors into a single platform. It enables an OpenCL application to share OpenCL objects between compute devices from different vendors. As a result, SnuCL achieves high performance and ease of programming for heterogeneous systems.

  • OpenCL framework for arm processors with neon support
    ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014
    Co-Authors: Won J Jeon, Wookeun Jung, Gordon Taft, Jaejin Lee
    Abstract:

    The state-of-the-art ARM processors provide multiple cores and SIMD instructions. OpenCL is a promising programming model for utilizing such parallel processing capability because of its SPMD programming model and built-in vector support. Moreover, it provides portability between multicore ARM processors and accelerators in embedded systems. In this paper, we introduce the design and implementation of an efficient OpenCL framework for multicore ARM processors. Computational tasks in a program are implemented as OpenCL kernels and run on all CPU cores in parallel by our OpenCL framework. Vector operations and built-in functions in OpenCL kernels are optimized using the NEON SIMD instruction set. We evaluate our OpenCL framework using 37 benchmark applications. The result shows that our approach is effective and promising.

  • WPMVP@PPoPP - OpenCL framework for ARM processors with NEON support
    Proceedings of the 2014 Workshop on Workshop on programming models for SIMD Vector processing - WPMVP '14, 2014
    Co-Authors: Won J Jeon, Wookeun Jung, Gordon Taft, Jaejin Lee
    Abstract:

    The state-of-the-art ARM processors provide multiple cores and SIMD instructions. OpenCL is a promising programming model for utilizing such parallel processing capability because of its SPMD programming model and built-in vector support. Moreover, it provides portability between multicore ARM processors and accelerators in embedded systems. In this paper, we introduce the design and implementation of an efficient OpenCL framework for multicore ARM processors. Computational tasks in a program are implemented as OpenCL kernels and run on all CPU cores in parallel by our OpenCL framework. Vector operations and built-in functions in OpenCL kernels are optimized using the NEON SIMD instruction set. We evaluate our OpenCL framework using 37 benchmark applications. The result shows that our approach is effective and promising.

  • FPT - An OpenCL optimizing compiler for reconfigurable processors
    2013 International Conference on Field-Programmable Technology (FPT), 2013
    Co-Authors: Jeongho Nah, Jun Lee, Hong-june Kim, Jin-seok Lee, Seok Joong Hwang, Dong-hoon Yoo, Jaejin Lee
    Abstract:

    This paper presents simple and efficient optimization techniques for an OpenCL compiler that targets reconfigurable processors. The target architecture consists of a generalpurpose processor core and an embedded reconfigurable accelerator with vector units. The accelerator is able to switch its architecture between the VLIW mode and the Coarse Grained Reconfigurable Array (CGRA) mode to achieve high performance. One big problem of this architecture is programming difficulty and OpenCL can be a good solution. However, since OpenCL does not guarantee performance portability, hardware dependent optimization is still necessary. Hence, we develop an OpenCL compiler framework that exploits the mode switching capability and vector units. To measure the effectiveness of the techniques, we have implemented the OpenCL framework and evaluate their performance with fourteen OpenCL benchmark applications.

Jenq Kuen Lee - One of the best experts on this subject based on the ideXlab platform.

  • viennacl enable tensorflow eigen via viennacl with OpenCL c flow
    International Workshop on OpenCL, 2018
    Co-Authors: Tailiang Chen, Shihhuan Chien, Jenq Kuen Lee
    Abstract:

    This paper presents the ViennaCL++, an OpenCL C++ kernel library for Vienna Computing Library (ViennaCL) combined with TensorFlow/Eigen library to enable acceleration and optimization of linear algebraic computing. Previously, TensorFlow would invoke Eigen for solvers. To enable OpenCL flow, one can invoke Eigen via ViennaCL to generate kernel programs for GPU computation. In order to support the features of the latest specification, the linear algebraic kernel library is migrated to OpenCL C++ with C++ features in ViennaCL++ to construct the OpenCL flow for TensorFlow and its underlying computational library Eigen. The software flow is based on the state-of-the-art specification of OpenCL and OpenCL C++ kernel langauge, as well as SPIR-V binary intermediate representation. The experimental results of ViennaCL++ which includes C++ class and SPIR-V flow are achieving 8 times and 49 times speedup for BLAS2 and BLAS3 operations compared to Eigen library on the x86_64 of Intel hardware. Overall, these results indicate that the performance of ViennaCL++ runtime execution with OpenCL C++ and SPIR-V flow is similar to traditional OpenCL C flow. Note that the Intel OpenCL 2.1 compiler is equipped with most Khronos OpenCL 2.2 (OpenCL C++) linguistic to support the experiment.

  • HPCC - Enable OpenCL Compiler with Open64 Infrastructures
    2011 IEEE International Conference on High Performance Computing and Communications, 2011
    Co-Authors: Yu-te Lin, Shao-chung Wang, Wen-li Shih, Brian Kun-yuan Hsieh, Jenq Kuen Lee
    Abstract:

    As microprocessors evolve into heterogeneous architectures with multi-cores of MPUs and GPUs, programming model supports become important for programming such architectures. To address this issue, OpenCL is proposed. Currently, most of OpenCL implementations take LLVM as their infrastructures. This presents an opportunity to demonstrate whether OpenCL can be effectively implemented on other compiler infrastructures. For example, Open64, which is another open source compiler and known to generate efficient codes for microprocessors, can contribute further to performance improvements and enhancing the adoption of heterogeneous computing based on OpenCL. In this paper, we describe the flow to enable an OpenCL compiler based on Open64 infrastructures for ATI GPUs. Our work includes the extension of the front-end parser for OpenCL, the generation of high-level intermediate representations with OpenCL linguistics, performing high-level optimization, and finally applying OpenCL specific optimization for code generations. Preliminary experimental results show that our compiler based on Open64 is able to generate efficient codes for OpenCL programs.

Luca Valcarenghi - One of the best experts on this subject based on the ideXlab platform.

  • Is OpenCL Driven Reconfigurable Hardware Suitable for Virtualising 5G Infrastructure?
    IEEE Transactions on Network and Service Management, 2020
    Co-Authors: Federico Civerchia, Koteswararao Kondepu, Luca Maggiani, Maxime Pelcat, Piero Castoldi, Luca Valcarenghi
    Abstract:

    The Open Computing Language (OpenCL) is increasingly adopted for programming processors with reconfigurable hardware acceleration. The 5G telecommunication infrastructure , imposing strong latency constraints on the managed communications , may benefit from OpenCL-designed accelerated processing. This paper presents the first study to evaluate OpenCL hardware acceleration in the context of a 5G base station physical layer. The implementation and optimization process to accelerate the Orthogonal Frequency Division Multiplexing (OFDM) part of the 5G downlink is conducted on a high-end Field Programmable Gate Array (FPGA). We show that the proposed OpenCL implementation complies with the 5G processing timing requirements since the computation time is consistent with the present 5G deployment. However, to be suitable for 5G, the OpenCL platform must improve the data latency transfer between hardware and software. Moreover, a further enhancement for the OpenCL implementation is to improve the code by means of OpenCL optimization techniques. In this way, the performance can be further improved with respect to optimized software on vectorized high-end processors.

Paul Chow - One of the best experts on this subject based on the ideXlab platform.

  • evaluating shared virtual memory in an OpenCL framework for embedded systems on fpgas
    Reconfigurable Computing and FPGAs, 2015
    Co-Authors: Vincent Mirian, Paul Chow
    Abstract:

    There is now significant interest in OpenCL for FPGAs because it is the first time the FPGA vendors have provided a programming model and a computing platform with integrated high-level synthesis. OpenCL is intended for heterogenous platforms, not just FPGAs, and the standard continues to evolve. Recently, OpenCL has introduced Shared Virtual Memory (SVM) with the goal of simplifying the programming model by allowing hosts and devices to access the same memory space more easily. In this paper, we propose different approaches to implement SVM in an OpenCL framework built specifically to study OpenCL in the context of embedded applications running on FPGAs. We evaluate these different approaches and compare the trade-offs between an OpenCL framework with SVM support and without SVM support. Our results show that the approach that implements the virtual address to physical address translation with a dedicated Memory Management Unit (MMU) performs better than the other approaches. Our results also show that, for input sizes less than 1MB for a vector addition benchmark, the OpenCL framework with SVM support performs better than the OpenCL framework without SVM support until the SVM handling in the kernel starts to dominate.

  • ut ocl an OpenCL framework for embedded systems using xilinx fpgas
    Reconfigurable Computing and FPGAs, 2015
    Co-Authors: Vincent Mirian, Paul Chow
    Abstract:

    FPGA vendors now include hardened IPs to form a system-on-chip (SoC) making it easier to build embedded systems. However programming and integrating hardware accelerators (devices) into these systems present a challenge. The OpenCL standard has become accepted as a good programming model for managing devices, or hardware accelerators in the context of embedded systems on FPGAs, due to its rich set of constructs. OpenCL has also caught the attention of FPGA vendors for use in high-level systhesis (HLS). While commercial OpenCL frameworks are now emerging, there is a need for an open-source OpenCL framework that facilitates the exploration of the overall system architecture and software, as well as the implementation and architectures of the task-level parallel devices. This would enable exploration of concepts that can improve current architectures as well as allow the study of features that are not within the current standard. This paper presents UT-OCL, an OpenCL framework for embedded systems using FPGAs. The framework is composed of a hardware system and its necessary software counterparts, which together form an embedded Linux system augmented to run OpenCL applications within a single FPGA. This paper describes the challenges with implementing an OpenCL framework for embedded systems on FPGAs, and presents an OpenCL implementation that is compliant with OpenCL 2.0. This framework is intended for use as a platform to explore architectures for hosting OpenCL applications, implemetations of OpenCL features and to study potential new features for OpenCL. Although the current trend is to use OpenCL in high-level synthesis targeting FPGAs, it is not the focus of this paper.

  • ReConFig - Evaluating shared virtual memory in an OpenCL framework for embedded systems on FPGAs
    2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig), 2015
    Co-Authors: Vincent Mirian, Paul Chow
    Abstract:

    There is now significant interest in OpenCL for FPGAs because it is the first time the FPGA vendors have provided a programming model and a computing platform with integrated high-level synthesis. OpenCL is intended for heterogenous platforms, not just FPGAs, and the standard continues to evolve. Recently, OpenCL has introduced Shared Virtual Memory (SVM) with the goal of simplifying the programming model by allowing hosts and devices to access the same memory space more easily. In this paper, we propose different approaches to implement SVM in an OpenCL framework built specifically to study OpenCL in the context of embedded applications running on FPGAs. We evaluate these different approaches and compare the trade-offs between an OpenCL framework with SVM support and without SVM support. Our results show that the approach that implements the virtual address to physical address translation with a dedicated Memory Management Unit (MMU) performs better than the other approaches. Our results also show that, for input sizes less than 1MB for a vector addition benchmark, the OpenCL framework with SVM support performs better than the OpenCL framework without SVM support until the SVM handling in the kernel starts to dominate.

  • ReConFig - UT-OCL: an OpenCL framework for embedded systems using xilinx FPGAs
    2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig), 2015
    Co-Authors: Vincent Mirian, Paul Chow
    Abstract:

    FPGA vendors now include hardened IPs to form a system-on-chip (SoC) making it easier to build embedded systems. However programming and integrating hardware accelerators (devices) into these systems present a challenge. The OpenCL standard has become accepted as a good programming model for managing devices, or hardware accelerators in the context of embedded systems on FPGAs, due to its rich set of constructs. OpenCL has also caught the attention of FPGA vendors for use in high-level systhesis (HLS). While commercial OpenCL frameworks are now emerging, there is a need for an open-source OpenCL framework that facilitates the exploration of the overall system architecture and software, as well as the implementation and architectures of the task-level parallel devices. This would enable exploration of concepts that can improve current architectures as well as allow the study of features that are not within the current standard. This paper presents UT-OCL, an OpenCL framework for embedded systems using FPGAs. The framework is composed of a hardware system and its necessary software counterparts, which together form an embedded Linux system augmented to run OpenCL applications within a single FPGA. This paper describes the challenges with implementing an OpenCL framework for embedded systems on FPGAs, and presents an OpenCL implementation that is compliant with OpenCL 2.0. This framework is intended for use as a platform to explore architectures for hosting OpenCL applications, implemetations of OpenCL features and to study potential new features for OpenCL. Although the current trend is to use OpenCL in high-level synthesis targeting FPGAs, it is not the focus of this paper.

Federico Civerchia - One of the best experts on this subject based on the ideXlab platform.

  • Is OpenCL Driven Reconfigurable Hardware Suitable for Virtualising 5G Infrastructure?
    IEEE Transactions on Network and Service Management, 2020
    Co-Authors: Federico Civerchia, Koteswararao Kondepu, Luca Maggiani, Maxime Pelcat, Piero Castoldi, Luca Valcarenghi
    Abstract:

    The Open Computing Language (OpenCL) is increasingly adopted for programming processors with reconfigurable hardware acceleration. The 5G telecommunication infrastructure , imposing strong latency constraints on the managed communications , may benefit from OpenCL-designed accelerated processing. This paper presents the first study to evaluate OpenCL hardware acceleration in the context of a 5G base station physical layer. The implementation and optimization process to accelerate the Orthogonal Frequency Division Multiplexing (OFDM) part of the 5G downlink is conducted on a high-end Field Programmable Gate Array (FPGA). We show that the proposed OpenCL implementation complies with the 5G processing timing requirements since the computation time is consistent with the present 5G deployment. However, to be suitable for 5G, the OpenCL platform must improve the data latency transfer between hardware and software. Moreover, a further enhancement for the OpenCL implementation is to improve the code by means of OpenCL optimization techniques. In this way, the performance can be further improved with respect to optimized software on vectorized high-end processors.