Opencl Application

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 2349 Experts worldwide ranked by ideXlab platform

David Kaeli - One of the best experts on this subject based on the ideXlab platform.

  • visualization of Opencl Application execution on cpu gpu systems
    Workshop On Computer Architecture Education, 2015
    Co-Authors: Amir Kavyan Ziabari, Dana Schaa, Rafael Ubal, David Kaeli
    Abstract:

    Evaluating the performance of parallel and heterogeneous programs and architectures can be challenging. An emulator or simulator can be used to aid the programmer. To provide guidance and feedback to the programmer, the simulator needs to present traces, reports, and debugging information in a coherent and unambiguous format. Although these outputs contain a lot of detailed information relative to the logical and physical transactions about the execution, they are usually extremely large and hard to analyze. What is needed is an interface into the simulator that can help programmers and architects shift through this myriad of data. In this contribution, we describe the M2S-Visual trace-driven visualization tool, a complementary addition to Multi2sim (M2S) heterogeneous system simulator. M2S-Visual provides a graphical representation of parallel program execution on the simulator. M2S is an established simulator, designed with an emphasis on simulating the execution of parallel Applications on graphics processing units, and provides a number of instrumentation capabilities that enable research in architecture exploration and Application characterization. This visualization framework, added to Multi2sim, aims to complement (and potentially replace) text-based statistical profiling, enabling the user to better learn and understand each software transaction executed on the simulated hardware. While M2S supports emulation of both Opencl and CUDA programs, our visualization framework presently only supports Opencl execution. M2S supports execution on both CPUs (X86, ARM and MIPS) and GPUs (AMD Evergreen and Southern Islands, and NVIDIA Fermi and Kepler), but presently only supports detailed visualization on a multicore X86 CPU and AMD Evergreen and Southern Islands GPUs. Besides supporting Opencl programming and debugging, an additional goal is to deliver a reliable product for teaching the details of parallel programming execution on heterogeneous systems. Given the move to many-core architectures in the industry, this toolset is timely and addresses a growing gap in our educational infrastructure. The tool is also designed to support the research community, providing analysis of performance bottlenecks of Opencl programs. We also incorporated the option to produce visualization graphs which provide deeper insight into Application performance and hardware resource utilization.

  • a framework for visualization of Opencl Applications execution a tutorial
    International Workshop on OpenCL, 2015
    Co-Authors: Amir Kavyan Ziabari, Dana Schaa, Rafael Ubal Tena, David Kaeli
    Abstract:

    Evaluating parallel and heterogeneous programs written in Opencl can be challenging. Commonly, simulators can be used to aid the programmer in this regard. One of the fundamental requirements of any simulator is to provide traces, reports, and debugging information in a coherent and unambiguous format. Although these traces or reports contain a lot of detailed information about the logical and physical transactions within a simulated structure, they are usually extremely large and hard to analyze. What is needed is an appropriate visualization tool to accompany the simulator to make Opencl execution process easier to understand and analyze. In this tutorial, we present M2S-Visual interactive cycle-by-cycle trace-driven visualization tool, a complimentary addition to Multi2sim (M2S). M2S is an established simulator, designed with an emphasis on running Opencl Applications without any source code modifications. The simulation of a complete Opencl Application occurs seamlessly by launching vendor-compliant host and device binaries. Multi2sim GPU emulator provides traces of Intel x86 CPU and AMD Southern-Island (as well as AMD Evergreen) GPU instructions, and the detailed simulator tracks execution times and state of architectural components in both host and device. M2S-Visual complements the simulator by providing the visual representation of running instructions and the state of the architectural components, together through a user-friendly GUI. During the execution of an Opencl Application, M2S-Visual captures and represents the state of CPU and GPU software entities (i.e. contexts, work-groups, wavefronts, and work-items), memory entities (i.e., accesses, sharers, owners), and network entities (i.e. messages and packets), along with the state of CPU and GPU hardware resources (i.e. cores and compute units), memory hierarchy (i.e., L1 cache, L2 cache and the main memory), and network resources (i.e., nodes, buses, links and buffers). We designed the M2S-Visual tool to support the research community, by providing deep analysis into the performance of Opencl programs. We also introduce other new visualization options (through statistical graphs) in M2S which provide further details on Opencl Application characteristics and utilization of system resources. This includes plots that reveals the occupancy of compute units based on static and run-time characteristics of the executed Opencl kernels, histograms that presents the memory access patterns of the Opencl Applications, plots that characterizes the network traffic generated by transactions between memory modules during an Opencl Application execution, and plots that reveals the utilization of network resources (such as links and buses) after the Application execution is complete. The tutorial is organized in two parts, covering the full-system visualization of Opencl Application execution via M2S-Visual, and characterization of Opencl Application impact on system resource using the generated static graphs. Each section is accompanied with simulation examples using working demos. All material to reproduce these demos, as well as the tutorial slides, will be available on the tutorial website at http://www.multi2sim.org/conferences/iwocl-2015.html.

  • analyzing program flow within a many kernel Opencl Application
    General Purpose Processing on Graphics Processing Units, 2011
    Co-Authors: Perhaad Mistry, David Kaeli, Chris Gregg, Norman Rubin, Kim Hazelwood
    Abstract:

    Many developers have begun to realize that heterogeneous multi-core and many-core computer systems can provide significant performance opportunities to a range of Applications. Typical Applications possess multiple components that can be parallelized; developers need to be equipped with proper performance tools to analyze program flow and identify Application bottlenecks. In this paper, we analyze and profile the components of the Speeded Up Robust Features (SURF) Computer Vision algorithm written in Opencl. Our profiling framework is developed using built-in Opencl API function calls, without the need for an external profiler. We show we can begin to identify performance bottlenecks and performance issues present in individual components on different hardware platforms. We demonstrate that by using run-time profiling using the Opencl specification, we can provide an Application developer with a fine-grained look at performance, and that this information can be used to tailor performance improvements for specific platforms.

Deshanand P. Singh - One of the best experts on this subject based on the ideXlab platform.

  • From Opencl to high-performance hardware on FPGAS
    22nd International Conference on Field Programmable Logic and Applications (FPL), 2012
    Co-Authors: Tomasz S. Czajkowski, Utku Aydonat, Dmitry Denisenko, John Freeman, Michael Kinsner, David Neto, Peter Yiannacouras, Jason Wong, Deshanand P. Singh
    Abstract:

    We present an Opencl compilation framework to generate high-performance hardware for FPGAs. For an Opencl Application comprising a host program and a set of kernels, it compiles the host program, generates Verilog HDL for each kernel, compiles the circuit using Altera Complete Design Suite 12.0, and downloads the compiled design onto an FPGA.We can then run the Application by executing the host program on a Windows(tm)-based machine, which communicates with kernels on an FPGA using a PCIe interface. We implement four Applications on an Altera Stratix IV and present the throughput and area results for each Application. We show that we can achieve a clock frequency in excess of 160MHz on our benchmarks, and that Opencl computing paradigm is a viable design entry method for high-performance computing Applications on FPGAs.

Jaejin Lee - One of the best experts on this subject based on the ideXlab platform.

  • snucl an Opencl framework for heterogeneous cpu gpu clusters
    International Conference on Supercomputing, 2012
    Co-Authors: Jung-won Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Jaejin Lee
    Abstract:

    In this paper, we propose SnuCL, an Opencl framework for heterogeneous CPU/GPU clusters. We show that the original Opencl semantics naturally fits to the heterogeneous cluster programming environment, and the framework achieves high performance and ease of programming. The target cluster architecture consists of a designated, single host node and many compute nodes. They are connected by an interconnection network, such as Gigabit Ethernet and InfiniBand switches. Each compute node is equipped with multicore CPUs and multiple GPUs. A set of CPU cores or each GPU becomes an Opencl compute device. The host node executes the host program in an Opencl Application. SnuCL provides a system image running a single operating system instance for heterogeneous CPU/GPU clusters to the user. It allows the Application to utilize compute devices in a compute node as if they were in the host node. No communication API, such as the MPI library, is required in the Application source. SnuCL also provides collective communication extensions to Opencl to facilitate manipulating memory objects. With SnuCL, an Opencl Application becomes portable not only between heterogeneous devices in a single node, but also between compute devices in the cluster environment. We implement SnuCL and evaluate its performance using eleven Opencl benchmark Applications.

  • achieving a single compute device image in Opencl for multiple gpus
    ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011
    Co-Authors: Jung-won Kim, Hong-gyu Kim, Joo Hwan Lee, Jaejin Lee
    Abstract:

    In this paper, we propose an Opencl framework that combines multiple GPUs and treats them as a single compute device. Providing a single virtual compute device image to the user makes an Opencl Application written for a single GPU portable to the platform that has multiple GPU devices. It also makes the Application exploit full computing power of the multiple GPU devices and the total amount of GPU memories available in the platform. Our Opencl framework automatically distributes at run-time the Opencl kernel written for a single GPU into multiple CUDA kernels that execute on the multiple GPU devices. It applies a run-time memory access range analysis to the kernel by performing a sampling run and identifies an optimal workload distribution for the kernel. To achieve a single compute device image, the runtime maintains virtual device memory that is allocated in the main memory. The Opencl runtime treats the memory as if it were the memory of a single GPU device and keeps it consistent to the memories of the multiple GPU devices. Our Opencl-C-to-C translator generates the sampling code from the Opencl kernel code and Opencl-C-to-CUDA-C translator generates the CUDA kernel code for the distributed Opencl kernel. We show the effectiveness of our Opencl framework by implementing the Opencl runtime and two source-to-source translators. We evaluate its performance with a system that contains 8 GPUs using 11 Opencl benchmark Applications.

Amir Kavyan Ziabari - One of the best experts on this subject based on the ideXlab platform.

  • visualization of Opencl Application execution on cpu gpu systems
    Workshop On Computer Architecture Education, 2015
    Co-Authors: Amir Kavyan Ziabari, Dana Schaa, Rafael Ubal, David Kaeli
    Abstract:

    Evaluating the performance of parallel and heterogeneous programs and architectures can be challenging. An emulator or simulator can be used to aid the programmer. To provide guidance and feedback to the programmer, the simulator needs to present traces, reports, and debugging information in a coherent and unambiguous format. Although these outputs contain a lot of detailed information relative to the logical and physical transactions about the execution, they are usually extremely large and hard to analyze. What is needed is an interface into the simulator that can help programmers and architects shift through this myriad of data. In this contribution, we describe the M2S-Visual trace-driven visualization tool, a complementary addition to Multi2sim (M2S) heterogeneous system simulator. M2S-Visual provides a graphical representation of parallel program execution on the simulator. M2S is an established simulator, designed with an emphasis on simulating the execution of parallel Applications on graphics processing units, and provides a number of instrumentation capabilities that enable research in architecture exploration and Application characterization. This visualization framework, added to Multi2sim, aims to complement (and potentially replace) text-based statistical profiling, enabling the user to better learn and understand each software transaction executed on the simulated hardware. While M2S supports emulation of both Opencl and CUDA programs, our visualization framework presently only supports Opencl execution. M2S supports execution on both CPUs (X86, ARM and MIPS) and GPUs (AMD Evergreen and Southern Islands, and NVIDIA Fermi and Kepler), but presently only supports detailed visualization on a multicore X86 CPU and AMD Evergreen and Southern Islands GPUs. Besides supporting Opencl programming and debugging, an additional goal is to deliver a reliable product for teaching the details of parallel programming execution on heterogeneous systems. Given the move to many-core architectures in the industry, this toolset is timely and addresses a growing gap in our educational infrastructure. The tool is also designed to support the research community, providing analysis of performance bottlenecks of Opencl programs. We also incorporated the option to produce visualization graphs which provide deeper insight into Application performance and hardware resource utilization.

  • a framework for visualization of Opencl Applications execution a tutorial
    International Workshop on OpenCL, 2015
    Co-Authors: Amir Kavyan Ziabari, Dana Schaa, Rafael Ubal Tena, David Kaeli
    Abstract:

    Evaluating parallel and heterogeneous programs written in Opencl can be challenging. Commonly, simulators can be used to aid the programmer in this regard. One of the fundamental requirements of any simulator is to provide traces, reports, and debugging information in a coherent and unambiguous format. Although these traces or reports contain a lot of detailed information about the logical and physical transactions within a simulated structure, they are usually extremely large and hard to analyze. What is needed is an appropriate visualization tool to accompany the simulator to make Opencl execution process easier to understand and analyze. In this tutorial, we present M2S-Visual interactive cycle-by-cycle trace-driven visualization tool, a complimentary addition to Multi2sim (M2S). M2S is an established simulator, designed with an emphasis on running Opencl Applications without any source code modifications. The simulation of a complete Opencl Application occurs seamlessly by launching vendor-compliant host and device binaries. Multi2sim GPU emulator provides traces of Intel x86 CPU and AMD Southern-Island (as well as AMD Evergreen) GPU instructions, and the detailed simulator tracks execution times and state of architectural components in both host and device. M2S-Visual complements the simulator by providing the visual representation of running instructions and the state of the architectural components, together through a user-friendly GUI. During the execution of an Opencl Application, M2S-Visual captures and represents the state of CPU and GPU software entities (i.e. contexts, work-groups, wavefronts, and work-items), memory entities (i.e., accesses, sharers, owners), and network entities (i.e. messages and packets), along with the state of CPU and GPU hardware resources (i.e. cores and compute units), memory hierarchy (i.e., L1 cache, L2 cache and the main memory), and network resources (i.e., nodes, buses, links and buffers). We designed the M2S-Visual tool to support the research community, by providing deep analysis into the performance of Opencl programs. We also introduce other new visualization options (through statistical graphs) in M2S which provide further details on Opencl Application characteristics and utilization of system resources. This includes plots that reveals the occupancy of compute units based on static and run-time characteristics of the executed Opencl kernels, histograms that presents the memory access patterns of the Opencl Applications, plots that characterizes the network traffic generated by transactions between memory modules during an Opencl Application execution, and plots that reveals the utilization of network resources (such as links and buses) after the Application execution is complete. The tutorial is organized in two parts, covering the full-system visualization of Opencl Application execution via M2S-Visual, and characterization of Opencl Application impact on system resource using the generated static graphs. Each section is accompanied with simulation examples using working demos. All material to reproduce these demos, as well as the tutorial slides, will be available on the tutorial website at http://www.multi2sim.org/conferences/iwocl-2015.html.

Tomasz S. Czajkowski - One of the best experts on this subject based on the ideXlab platform.

  • From Opencl to high-performance hardware on FPGAS
    22nd International Conference on Field Programmable Logic and Applications (FPL), 2012
    Co-Authors: Tomasz S. Czajkowski, Utku Aydonat, Dmitry Denisenko, John Freeman, Michael Kinsner, David Neto, Peter Yiannacouras, Jason Wong, Deshanand P. Singh
    Abstract:

    We present an Opencl compilation framework to generate high-performance hardware for FPGAs. For an Opencl Application comprising a host program and a set of kernels, it compiles the host program, generates Verilog HDL for each kernel, compiles the circuit using Altera Complete Design Suite 12.0, and downloads the compiled design onto an FPGA.We can then run the Application by executing the host program on a Windows(tm)-based machine, which communicates with kernels on an FPGA using a PCIe interface. We implement four Applications on an Altera Stratix IV and present the throughput and area results for each Application. We show that we can achieve a clock frequency in excess of 160MHz on our benchmarks, and that Opencl computing paradigm is a viable design entry method for high-performance computing Applications on FPGAs.