Opencl Implementation

The Experts below are selected from a list of 3162 Experts worldwide ranked by ideXlab platform

E. Baron - One of the best experts on this subject based on the ideXlab platform.

A 3D radiative transfer framework VIII. Opencl Implementation

Astronomy & Astrophysics, 2011

Co-Authors: Peter H. Hauschildt, E. Baron

Abstract:

Aims. We discuss an Implementation of our 3D radiative transfer (3DRT) framework with the Opencl paradigm for general GPU computing. Methods. We implemented the kernel for solving the 3DRT problem in Cartesian coordinates with periodic boundary conditions in the horizontal (x ,y ) plane, including the construction of the nearest neighbor Λ ∗ and the operator splitting step. Results. We present the results of both a small and a large test case and compare the timing of the 3DRT calculations for serial CPUs and various GPUs. Conclusions. The latest available GPUs can lead to significant speedups for both small and large grids compared to serial (single core) computations.

15 days free trial to Access Article
A 3D radiative transfer framework: XIII. Opencl Implementation

arXiv: Instrumentation and Methods for Astrophysics, 2011

Co-Authors: Peter H. Hauschildt, E. Baron

Abstract:

We discuss an Implementation of our 3D radiative transfer (3DRT) framework with the Opencl paradigm for general GPU computing. We implement the kernel for solving the 3DRT problem in Cartesian coordinates with periodic boundary conditions in the horizontal $(x,y)$ plane, including the construction of the nearest neighbor $\Lstar$ and the operator splitting step. We present the results of a small and a large test case and compare the timing of the 3DRT calculations for serial CPUs and various GPUs. The latest available GPUs can lead to significant speedups for both small and large grids compared to serial (single core) computations.

15 days free trial to Access Article

Luca Valcarenghi - One of the best experts on this subject based on the ideXlab platform.

Is Opencl Driven Reconfigurable Hardware Suitable for Virtualising 5G Infrastructure?

IEEE Transactions on Network and Service Management, 2020

Co-Authors: Federico Civerchia, Koteswararao Kondepu, Luca Maggiani, Maxime Pelcat, Piero Castoldi, Luca Valcarenghi

Abstract:

The Open Computing Language (Opencl) is increasingly adopted for programming processors with reconfigurable hardware acceleration. The 5G telecommunication infrastructure , imposing strong latency constraints on the managed communications , may benefit from Opencl-designed accelerated processing. This paper presents the first study to evaluate Opencl hardware acceleration in the context of a 5G base station physical layer. The Implementation and optimization process to accelerate the Orthogonal Frequency Division Multiplexing (OFDM) part of the 5G downlink is conducted on a high-end Field Programmable Gate Array (FPGA). We show that the proposed Opencl Implementation complies with the 5G processing timing requirements since the computation time is consistent with the present 5G deployment. However, to be suitable for 5G, the Opencl platform must improve the data latency transfer between hardware and software. Moreover, a further enhancement for the Opencl Implementation is to improve the code by means of Opencl optimization techniques. In this way, the performance can be further improved with respect to optimized software on vectorized high-end processors.

15 days free trial to Access Article

Peter H. Hauschildt - One of the best experts on this subject based on the ideXlab platform.

A 3D radiative transfer framework VIII. Opencl Implementation

Astronomy & Astrophysics, 2011

Co-Authors: Peter H. Hauschildt, E. Baron

Abstract:

Aims. We discuss an Implementation of our 3D radiative transfer (3DRT) framework with the Opencl paradigm for general GPU computing. Methods. We implemented the kernel for solving the 3DRT problem in Cartesian coordinates with periodic boundary conditions in the horizontal (x ,y ) plane, including the construction of the nearest neighbor Λ ∗ and the operator splitting step. Results. We present the results of both a small and a large test case and compare the timing of the 3DRT calculations for serial CPUs and various GPUs. Conclusions. The latest available GPUs can lead to significant speedups for both small and large grids compared to serial (single core) computations.

15 days free trial to Access Article
A 3D radiative transfer framework: XIII. Opencl Implementation

arXiv: Instrumentation and Methods for Astrophysics, 2011

Co-Authors: Peter H. Hauschildt, E. Baron

Abstract:

We discuss an Implementation of our 3D radiative transfer (3DRT) framework with the Opencl paradigm for general GPU computing. We implement the kernel for solving the 3DRT problem in Cartesian coordinates with periodic boundary conditions in the horizontal $(x,y)$ plane, including the construction of the nearest neighbor $\Lstar$ and the operator splitting step. We present the results of a small and a large test case and compare the timing of the 3DRT calculations for serial CPUs and various GPUs. The latest available GPUs can lead to significant speedups for both small and large grids compared to serial (single core) computations.

15 days free trial to Access Article

Federico Civerchia - One of the best experts on this subject based on the ideXlab platform.

Is Opencl Driven Reconfigurable Hardware Suitable for Virtualising 5G Infrastructure?

IEEE Transactions on Network and Service Management, 2020

Co-Authors: Federico Civerchia, Koteswararao Kondepu, Luca Maggiani, Maxime Pelcat, Piero Castoldi, Luca Valcarenghi

Abstract:

The Open Computing Language (Opencl) is increasingly adopted for programming processors with reconfigurable hardware acceleration. The 5G telecommunication infrastructure , imposing strong latency constraints on the managed communications , may benefit from Opencl-designed accelerated processing. This paper presents the first study to evaluate Opencl hardware acceleration in the context of a 5G base station physical layer. The Implementation and optimization process to accelerate the Orthogonal Frequency Division Multiplexing (OFDM) part of the 5G downlink is conducted on a high-end Field Programmable Gate Array (FPGA). We show that the proposed Opencl Implementation complies with the 5G processing timing requirements since the computation time is consistent with the present 5G deployment. However, to be suitable for 5G, the Opencl platform must improve the data latency transfer between hardware and software. Moreover, a further enhancement for the Opencl Implementation is to improve the code by means of Opencl optimization techniques. In this way, the performance can be further improved with respect to optimized software on vectorized high-end processors.

15 days free trial to Access Article

Heikki Berg - One of the best experts on this subject based on the ideXlab platform.

pocl: A Performance-Portable Opencl Implementation

International Journal of Parallel Programming, 2015

Co-Authors: Pekka Jääskeläinen, Carlos Sánchez Lama, Kalle Raiskila, Erik Schnetter, Jarmo Takala, Heikki Berg

Abstract:

Opencl is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus reducing the program porting effort. While the standard brings the obvious benefits of platform portability, the performance portability aspects are largely left to the programmer. The situation is made worse due to multiple proprietary vendor Implementations with different characteristics, and, thus, required optimization strategies. In this paper, we propose an Opencl Implementation that is both portable and performance portable. At its core is a kernel compiler that can be used to exploit the data parallelism of Opencl programs on multiple platforms with different parallel hardware styles. The kernel compiler is modularized to perform target-independent parallel region formation separately from the target-specific parallel mapping of the regions to enable support for various styles of fine-grained parallel resources such as subword SIMD extensions, SIMD datapaths and static multi-issue. Unlike previous similar techniques that work on the source level, the parallel region formation retains the information of the data parallelism using the LLVM IR and its metadata infrastructure. This data can be exploited by the later generic compiler passes for efficient parallelization. The proposed open source Implementation of Opencl is also platform portable, enabling Opencl on a wide range of architectures, both already commercialized and on those that are still under research. The paper describes how the portability of the Implementation is achieved. We test the two aspects to portability by utilizing the kernel compiler and the Opencl Implementation to run Opencl applications in various platforms with different style of parallel resources. The results show that most of the benchmarked applications when compiled using pocl were faster or close to as fast as the best proprietary Opencl Implementation for the platform at hand.

15 days free trial to Access Article
pocl: A Performance-Portable Opencl Implementation

International Journal of Parallel Programming, 2014

Co-Authors: Pekka Jääskeläinen, Carlos Sánchez Lama, Kalle Raiskila, Erik Schnetter, Jarmo Takala, Heikki Berg

Abstract:

Opencl is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus reducing the program porting effort. While the standard brings the obvious benefits of platform portability, the performance portability aspects are largely left to the programmer. The situation is made worse due to multiple proprietary vendor Implementations with different characteristics, and, thus, required optimization strategies. In this paper, we propose an Opencl Implementation that is both portable and performance portable. At its core is a kernel compiler that can be used to exploit the data parallelism of Opencl programs on multiple platforms with different parallel hardware styles. The kernel compiler is modularized to perform target-independent parallel region formation separately from the target-specific parallel mapping of the regions to enable support for various styles of fine-grained parallel resources such as subword SIMD extensions, SIMD datapaths and static multi-issue. Unlike previous similar techniques that work on the source level, the parallel region formation retains the information of the data parallelism using the LLVM IR and its metadata infrastructure. This data can be exploited by the later generic compiler passes for efficient parallelization. The proposed open source Implementation of Opencl is also platform portable, enabling Opencl on a wide range of architectures, both already commercialized and on those that are still under research. The paper describes how the portability of the Implementation is achieved. Our results show that most of the benchmarked applications when compiled using pocl were faster or close to as fast as the best proprietary Opencl Implementation for the platform at hand.

15 days free trial to Access Article
Opencl Implementation of cholesky matrix decomposition

International Symposium on System-on-Chip, 2011

Co-Authors: Claudio Brunelli, Eero Aho, Heikki Berg

Abstract:

This paper presents some Opencl Implementations for Cholesky decomposition, a very popular algorithm used in linear algebra and signal processing applications. The Cholesky algorithm represents a very interesting candidate for Opencl Implementation since it contains sequential parts besides parallel ones. Furthermore, one step involves just a small amount of calculations. These characteristics pose challenges which call for suitable techniques to overcome the limitations of the language. We propose several versions of the Implementation of the Cholesky algorithm, then provide an analysis of the trade off between complexity and performance offered by each of them. We also analyze the differences between execution of the program on GPU and on multicore CPU.

15 days free trial to Access Article
SoC - Opencl Implementation of Cholesky matrix decomposition

2011 International Symposium on System on Chip (SoC), 2011

Co-Authors: Claudio Brunelli, Eero Aho, Heikki Berg

Abstract:

This paper presents some Opencl Implementations for Cholesky decomposition, a very popular algorithm used in linear algebra and signal processing applications. The Cholesky algorithm represents a very interesting candidate for Opencl Implementation since it contains sequential parts besides parallel ones. Furthermore, one step involves just a small amount of calculations. These characteristics pose challenges which call for suitable techniques to overcome the limitations of the language. We propose several versions of the Implementation of the Cholesky algorithm, then provide an analysis of the trade off between complexity and performance offered by each of them. We also analyze the differences between execution of the program on GPU and on multicore CPU.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

E. Baron - One of the best experts on this subject based on the ideXlab platform.

A 3D radiative transfer framework VIII. Opencl Implementation

A 3D radiative transfer framework: XIII. Opencl Implementation

Luca Valcarenghi - One of the best experts on this subject based on the ideXlab platform.

Is Opencl Driven Reconfigurable Hardware Suitable for Virtualising 5G Infrastructure?

Peter H. Hauschildt - One of the best experts on this subject based on the ideXlab platform.

A 3D radiative transfer framework VIII. Opencl Implementation

A 3D radiative transfer framework: XIII. Opencl Implementation

Federico Civerchia - One of the best experts on this subject based on the ideXlab platform.

Is Opencl Driven Reconfigurable Hardware Suitable for Virtualising 5G Infrastructure?

Heikki Berg - One of the best experts on this subject based on the ideXlab platform.

pocl: A Performance-Portable Opencl Implementation

pocl: A Performance-Portable Opencl Implementation

Opencl Implementation of cholesky matrix decomposition

SoC - Opencl Implementation of Cholesky matrix decomposition

Opencl Implementation

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Related terms

E. Baron - One of the best experts on this subject based on the ideXlab platform.

Luca Valcarenghi - One of the best experts on this subject based on the ideXlab platform.

Peter H. Hauschildt - One of the best experts on this subject based on the ideXlab platform.

Federico Civerchia - One of the best experts on this subject based on the ideXlab platform.

Heikki Berg - One of the best experts on this subject based on the ideXlab platform.

Related terms