Opencl Implementation

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 3162 Experts worldwide ranked by ideXlab platform

E. Baron - One of the best experts on this subject based on the ideXlab platform.

  • A 3D radiative transfer framework VIII. Opencl Implementation
    Astronomy & Astrophysics, 2011
    Co-Authors: Peter H. Hauschildt, E. Baron
    Abstract:

    Aims. We discuss an Implementation of our 3D radiative transfer (3DRT) framework with the Opencl paradigm for general GPU computing. Methods. We implemented the kernel for solving the 3DRT problem in Cartesian coordinates with periodic boundary conditions in the horizontal (x ,y ) plane, including the construction of the nearest neighbor Λ ∗ and the operator splitting step. Results. We present the results of both a small and a large test case and compare the timing of the 3DRT calculations for serial CPUs and various GPUs. Conclusions. The latest available GPUs can lead to significant speedups for both small and large grids compared to serial (single core) computations.

  • A 3D radiative transfer framework: XIII. Opencl Implementation
    arXiv: Instrumentation and Methods for Astrophysics, 2011
    Co-Authors: Peter H. Hauschildt, E. Baron
    Abstract:

    We discuss an Implementation of our 3D radiative transfer (3DRT) framework with the Opencl paradigm for general GPU computing. We implement the kernel for solving the 3DRT problem in Cartesian coordinates with periodic boundary conditions in the horizontal $(x,y)$ plane, including the construction of the nearest neighbor $\Lstar$ and the operator splitting step. We present the results of a small and a large test case and compare the timing of the 3DRT calculations for serial CPUs and various GPUs. The latest available GPUs can lead to significant speedups for both small and large grids compared to serial (single core) computations.

Luca Valcarenghi - One of the best experts on this subject based on the ideXlab platform.

  • Is Opencl Driven Reconfigurable Hardware Suitable for Virtualising 5G Infrastructure?
    IEEE Transactions on Network and Service Management, 2020
    Co-Authors: Federico Civerchia, Koteswararao Kondepu, Luca Maggiani, Maxime Pelcat, Piero Castoldi, Luca Valcarenghi
    Abstract:

    The Open Computing Language (Opencl) is increasingly adopted for programming processors with reconfigurable hardware acceleration. The 5G telecommunication infrastructure , imposing strong latency constraints on the managed communications , may benefit from Opencl-designed accelerated processing. This paper presents the first study to evaluate Opencl hardware acceleration in the context of a 5G base station physical layer. The Implementation and optimization process to accelerate the Orthogonal Frequency Division Multiplexing (OFDM) part of the 5G downlink is conducted on a high-end Field Programmable Gate Array (FPGA). We show that the proposed Opencl Implementation complies with the 5G processing timing requirements since the computation time is consistent with the present 5G deployment. However, to be suitable for 5G, the Opencl platform must improve the data latency transfer between hardware and software. Moreover, a further enhancement for the Opencl Implementation is to improve the code by means of Opencl optimization techniques. In this way, the performance can be further improved with respect to optimized software on vectorized high-end processors.

Peter H. Hauschildt - One of the best experts on this subject based on the ideXlab platform.

  • A 3D radiative transfer framework VIII. Opencl Implementation
    Astronomy & Astrophysics, 2011
    Co-Authors: Peter H. Hauschildt, E. Baron
    Abstract:

    Aims. We discuss an Implementation of our 3D radiative transfer (3DRT) framework with the Opencl paradigm for general GPU computing. Methods. We implemented the kernel for solving the 3DRT problem in Cartesian coordinates with periodic boundary conditions in the horizontal (x ,y ) plane, including the construction of the nearest neighbor Λ ∗ and the operator splitting step. Results. We present the results of both a small and a large test case and compare the timing of the 3DRT calculations for serial CPUs and various GPUs. Conclusions. The latest available GPUs can lead to significant speedups for both small and large grids compared to serial (single core) computations.

  • A 3D radiative transfer framework: XIII. Opencl Implementation
    arXiv: Instrumentation and Methods for Astrophysics, 2011
    Co-Authors: Peter H. Hauschildt, E. Baron
    Abstract:

    We discuss an Implementation of our 3D radiative transfer (3DRT) framework with the Opencl paradigm for general GPU computing. We implement the kernel for solving the 3DRT problem in Cartesian coordinates with periodic boundary conditions in the horizontal $(x,y)$ plane, including the construction of the nearest neighbor $\Lstar$ and the operator splitting step. We present the results of a small and a large test case and compare the timing of the 3DRT calculations for serial CPUs and various GPUs. The latest available GPUs can lead to significant speedups for both small and large grids compared to serial (single core) computations.

Federico Civerchia - One of the best experts on this subject based on the ideXlab platform.

  • Is Opencl Driven Reconfigurable Hardware Suitable for Virtualising 5G Infrastructure?
    IEEE Transactions on Network and Service Management, 2020
    Co-Authors: Federico Civerchia, Koteswararao Kondepu, Luca Maggiani, Maxime Pelcat, Piero Castoldi, Luca Valcarenghi
    Abstract:

    The Open Computing Language (Opencl) is increasingly adopted for programming processors with reconfigurable hardware acceleration. The 5G telecommunication infrastructure , imposing strong latency constraints on the managed communications , may benefit from Opencl-designed accelerated processing. This paper presents the first study to evaluate Opencl hardware acceleration in the context of a 5G base station physical layer. The Implementation and optimization process to accelerate the Orthogonal Frequency Division Multiplexing (OFDM) part of the 5G downlink is conducted on a high-end Field Programmable Gate Array (FPGA). We show that the proposed Opencl Implementation complies with the 5G processing timing requirements since the computation time is consistent with the present 5G deployment. However, to be suitable for 5G, the Opencl platform must improve the data latency transfer between hardware and software. Moreover, a further enhancement for the Opencl Implementation is to improve the code by means of Opencl optimization techniques. In this way, the performance can be further improved with respect to optimized software on vectorized high-end processors.

Heikki Berg - One of the best experts on this subject based on the ideXlab platform.

  • pocl: A Performance-Portable Opencl Implementation
    International Journal of Parallel Programming, 2015
    Co-Authors: Pekka Jääskeläinen, Carlos Sánchez Lama, Kalle Raiskila, Erik Schnetter, Jarmo Takala, Heikki Berg
    Abstract:

    Opencl is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus reducing the program porting effort. While the standard brings the obvious benefits of platform portability, the performance portability aspects are largely left to the programmer. The situation is made worse due to multiple proprietary vendor Implementations with different characteristics, and, thus, required optimization strategies. In this paper, we propose an Opencl Implementation that is both portable and performance portable. At its core is a kernel compiler that can be used to exploit the data parallelism of Opencl programs on multiple platforms with different parallel hardware styles. The kernel compiler is modularized to perform target-independent parallel region formation separately from the target-specific parallel mapping of the regions to enable support for various styles of fine-grained parallel resources such as subword SIMD extensions, SIMD datapaths and static multi-issue. Unlike previous similar techniques that work on the source level, the parallel region formation retains the information of the data parallelism using the LLVM IR and its metadata infrastructure. This data can be exploited by the later generic compiler passes for efficient parallelization. The proposed open source Implementation of Opencl is also platform portable, enabling Opencl on a wide range of architectures, both already commercialized and on those that are still under research. The paper describes how the portability of the Implementation is achieved. We test the two aspects to portability by utilizing the kernel compiler and the Opencl Implementation to run Opencl applications in various platforms with different style of parallel resources. The results show that most of the benchmarked applications when compiled using pocl were faster or close to as fast as the best proprietary Opencl Implementation for the platform at hand.

  • pocl: A Performance-Portable Opencl Implementation
    International Journal of Parallel Programming, 2014
    Co-Authors: Pekka Jääskeläinen, Carlos Sánchez Lama, Kalle Raiskila, Erik Schnetter, Jarmo Takala, Heikki Berg
    Abstract:

    Opencl is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus reducing the program porting effort. While the standard brings the obvious benefits of platform portability, the performance portability aspects are largely left to the programmer. The situation is made worse due to multiple proprietary vendor Implementations with different characteristics, and, thus, required optimization strategies. In this paper, we propose an Opencl Implementation that is both portable and performance portable. At its core is a kernel compiler that can be used to exploit the data parallelism of Opencl programs on multiple platforms with different parallel hardware styles. The kernel compiler is modularized to perform target-independent parallel region formation separately from the target-specific parallel mapping of the regions to enable support for various styles of fine-grained parallel resources such as subword SIMD extensions, SIMD datapaths and static multi-issue. Unlike previous similar techniques that work on the source level, the parallel region formation retains the information of the data parallelism using the LLVM IR and its metadata infrastructure. This data can be exploited by the later generic compiler passes for efficient parallelization. The proposed open source Implementation of Opencl is also platform portable, enabling Opencl on a wide range of architectures, both already commercialized and on those that are still under research. The paper describes how the portability of the Implementation is achieved. Our results show that most of the benchmarked applications when compiled using pocl were faster or close to as fast as the best proprietary Opencl Implementation for the platform at hand.

  • Opencl Implementation of cholesky matrix decomposition
    International Symposium on System-on-Chip, 2011
    Co-Authors: Claudio Brunelli, Eero Aho, Heikki Berg
    Abstract:

    This paper presents some Opencl Implementations for Cholesky decomposition, a very popular algorithm used in linear algebra and signal processing applications. The Cholesky algorithm represents a very interesting candidate for Opencl Implementation since it contains sequential parts besides parallel ones. Furthermore, one step involves just a small amount of calculations. These characteristics pose challenges which call for suitable techniques to overcome the limitations of the language. We propose several versions of the Implementation of the Cholesky algorithm, then provide an analysis of the trade off between complexity and performance offered by each of them. We also analyze the differences between execution of the program on GPU and on multicore CPU.

  • SoC - Opencl Implementation of Cholesky matrix decomposition
    2011 International Symposium on System on Chip (SoC), 2011
    Co-Authors: Claudio Brunelli, Eero Aho, Heikki Berg
    Abstract:

    This paper presents some Opencl Implementations for Cholesky decomposition, a very popular algorithm used in linear algebra and signal processing applications. The Cholesky algorithm represents a very interesting candidate for Opencl Implementation since it contains sequential parts besides parallel ones. Furthermore, one step involves just a small amount of calculations. These characteristics pose challenges which call for suitable techniques to overcome the limitations of the language. We propose several versions of the Implementation of the Cholesky algorithm, then provide an analysis of the trade off between complexity and performance offered by each of them. We also analyze the differences between execution of the program on GPU and on multicore CPU.