Kernel Argument

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 15 Experts worldwide ranked by ideXlab platform

Ajitha Rajan - One of the best experts on this subject based on the ideXlab platform.

  • GPGPU@PPoPP - Automated test generation for OpenCL Kernels using fuzzing and constraint solving
    Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit, 2020
    Co-Authors: Chao Peng, Ajitha Rajan
    Abstract:

    Graphics Processing Units (GPUs) are massively parallel processors offering performance acceleration and energy efficiency unmatched by current processors (CPUs) in computers. These advantages along with recent advances in the programmability of GPUs have made them attractive for general-purpose computations. Despite the advances in programmability, GPU Kernels are hard to code and analyse due to the high complexity of memory sharing patterns, striding patterns for memory accesses, implicit synchronisation, and combinatorial explosion of thread interleavings. Existing few techniques for testing GPU Kernels use symbolic execution for test generation that incur a high overhead, have limited scalability and do not handle all data types. We propose a test generation technique for OpenCL Kernels that combines mutation-based fuzzing and selective constraint solving with the goal of being fast, effective and scalable. Fuzz testing for GPU Kernels has not been explored previously. Our approach for fuzz testing randomly mutates input Kernel Argument values with the goal of increasing branch coverage. When fuzz testing is unable to increase branch coverage with random mutations, we gather path constraints for uncovered branch conditions and invoke the Z3 constraint solver to generate tests for them. In addition to the test generator, we also present a schedule amplifier that simulates multiple work-group schedules, with which to execute each of the generated tests. The schedule amplifier is designed to help uncover inter work-group data races. We evaluate the effectiveness of the generated tests and schedule amplifier using 217 Kernels from open source projects and industry standard benchmark suites measuring branch coverage and fault finding. We find our test generation technique achieves close to 100% coverage and mutation score for majority of the Kernels. Overhead incurred in test generation is small (average of 0.8 seconds). We also confirmed our technique scales easily to large Kernels, and can support all OpenCL data types, including complex data structures.

Martin Campos Pinto - One of the best experts on this subject based on the ideXlab platform.

  • Smooth particle methods without smoothing
    arXiv: Numerical Analysis, 2011
    Co-Authors: Martin Campos Pinto
    Abstract:

    We present a novel class of particle methods with deformable shapes that achieve high-order convergence rates in the supremum norm. These methods do not require remappings or extended overlapping or vanishing moments for the particles. Unlike classical convergence analysis, our estimates do not rely on a smoothing Kernel Argument but rather on the uniformly bounded overlapping of the particles supports and on the smoothness of the characteristic flow. In particular, they also apply to heterogeneous "particle approximations" such as piecewise polynomial bases on unstructured meshes. In the first-order case which simply consists of pushing forward linearly transformed particles (LTP) along the flow, we provide an explicit scheme and establish rigorous error estimates that demonstrate its uniform convergence and the uniform boundedness of the particle overlapping. To illustrate the flexibility of the method we also develop an adaptive multilevel version that includes a local correction filter for positivity-preserving hierarchical approximations. Numerical studies demonstrate the convergence properties of this new particle scheme in both its uniform and adaptive versions, and compare it with traditional fixed-shape particle methods with or without remappings.

Chao Peng - One of the best experts on this subject based on the ideXlab platform.

  • GPGPU@PPoPP - Automated test generation for OpenCL Kernels using fuzzing and constraint solving
    Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit, 2020
    Co-Authors: Chao Peng, Ajitha Rajan
    Abstract:

    Graphics Processing Units (GPUs) are massively parallel processors offering performance acceleration and energy efficiency unmatched by current processors (CPUs) in computers. These advantages along with recent advances in the programmability of GPUs have made them attractive for general-purpose computations. Despite the advances in programmability, GPU Kernels are hard to code and analyse due to the high complexity of memory sharing patterns, striding patterns for memory accesses, implicit synchronisation, and combinatorial explosion of thread interleavings. Existing few techniques for testing GPU Kernels use symbolic execution for test generation that incur a high overhead, have limited scalability and do not handle all data types. We propose a test generation technique for OpenCL Kernels that combines mutation-based fuzzing and selective constraint solving with the goal of being fast, effective and scalable. Fuzz testing for GPU Kernels has not been explored previously. Our approach for fuzz testing randomly mutates input Kernel Argument values with the goal of increasing branch coverage. When fuzz testing is unable to increase branch coverage with random mutations, we gather path constraints for uncovered branch conditions and invoke the Z3 constraint solver to generate tests for them. In addition to the test generator, we also present a schedule amplifier that simulates multiple work-group schedules, with which to execute each of the generated tests. The schedule amplifier is designed to help uncover inter work-group data races. We evaluate the effectiveness of the generated tests and schedule amplifier using 217 Kernels from open source projects and industry standard benchmark suites measuring branch coverage and fault finding. We find our test generation technique achieves close to 100% coverage and mutation score for majority of the Kernels. Overhead incurred in test generation is small (average of 0.8 seconds). We also confirmed our technique scales easily to large Kernels, and can support all OpenCL data types, including complex data structures.

Eduard Ayguade Parra - One of the best experts on this subject based on the ideXlab platform.

  • LCPC - OmpSs-OpenCL Programming Model for Heterogeneous Systems
    Languages and Compilers for Parallel Computing, 2013
    Co-Authors: Vinoth Krishnan Elangovan, Rosa M. Badia, Eduard Ayguade Parra
    Abstract:

    The advent of heterogeneous computing has forced programmers to use platform specific programming paradigms in order to achieve maximum performance. This approach has a steep learning curve for programmers and also has detrimental influence on productivity and code re-usability. To help with this situation, OpenCL an open-source, parallel computing API for cross platform computations was conceived. OpenCL provides a homogeneous view of the computational resources (CPU and GPU) thereby enabling software portability across different platforms. Although OpenCL resolves software portability issues, the programming paradigm presents low programmability and additionally falls short in performance. In this paper we focus on integrating OpenCL framework with the OmpSs task based programming model using Nanos run time infrastructure to address these shortcomings. This would enable the programmer to skip cumbersome OpenCL constructs including OpenCL plaform creation, compilation, Kernel building, Kernel Argument setting and memory transfers, instead write a sequential program with annotated pragmas. Our proposal mainly focuses on how to exploit the best of the underlying hardware platform with greater ease in programming and to gain significant performance using the data parallelism offered by the OpenCL run time for GPUs and multicore architectures. We have evaluated the platform with important benchmarks and have noticed substantial ease in programming with comparable performance.

Vinoth Krishnan Elangovan - One of the best experts on this subject based on the ideXlab platform.

  • LCPC - OmpSs-OpenCL Programming Model for Heterogeneous Systems
    Languages and Compilers for Parallel Computing, 2013
    Co-Authors: Vinoth Krishnan Elangovan, Rosa M. Badia, Eduard Ayguade Parra
    Abstract:

    The advent of heterogeneous computing has forced programmers to use platform specific programming paradigms in order to achieve maximum performance. This approach has a steep learning curve for programmers and also has detrimental influence on productivity and code re-usability. To help with this situation, OpenCL an open-source, parallel computing API for cross platform computations was conceived. OpenCL provides a homogeneous view of the computational resources (CPU and GPU) thereby enabling software portability across different platforms. Although OpenCL resolves software portability issues, the programming paradigm presents low programmability and additionally falls short in performance. In this paper we focus on integrating OpenCL framework with the OmpSs task based programming model using Nanos run time infrastructure to address these shortcomings. This would enable the programmer to skip cumbersome OpenCL constructs including OpenCL plaform creation, compilation, Kernel building, Kernel Argument setting and memory transfers, instead write a sequential program with annotated pragmas. Our proposal mainly focuses on how to exploit the best of the underlying hardware platform with greater ease in programming and to gain significant performance using the data parallelism offered by the OpenCL run time for GPUs and multicore architectures. We have evaluated the platform with important benchmarks and have noticed substantial ease in programming with comparable performance.