Runtime Binary

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 51 Experts worldwide ranked by ideXlab platform

Penchung Yew - One of the best experts on this subject based on the ideXlab platform.

  • cobra an adaptive Runtime Binary optimization framework for multithreaded applications
    International Conference on Parallel Processing, 2007
    Co-Authors: Jinpyo Kim, Weichung Hsu, Penchung Yew
    Abstract:

    This paper presents COBRA (continuous Binary re-adaptation), a Runtime Binary optimization framework, for multithreaded applications. It is currently implemented on Itanium 2 based SMP and cc-NUMA systems. Using OpenMP NAS parallel benchmark, we show how COBRA can adoptively choose appropriate optimizations according to observed changing Runtime program behavior. Coherent cache misses caused by true/false data sharing often limit the scalability of multithreaded applications. This paper shows that COBRA can significantly improve the performance of some applications parallelized with OpenMP, by reducing the aggressiveness of data prefetching and by using exclusive hints for prefetch instructions. For example, we show that COBRA can improve the performance of OpenMP NAS parallel benchmarks up to 68%, with an average of 17.5% on the SGI Altix cc-NUMA system.

  • ICPP - COBRA: An Adaptive Runtime Binary Optimization Framework for Multithreaded Applications
    2007 International Conference on Parallel Processing (ICPP 2007), 2007
    Co-Authors: Jinpyo Kim, Weichung Hsu, Penchung Yew
    Abstract:

    This paper presents COBRA (continuous Binary re-adaptation), a Runtime Binary optimization framework, for multithreaded applications. It is currently implemented on Itanium 2 based SMP and cc-NUMA systems. Using OpenMP NAS parallel benchmark, we show how COBRA can adoptively choose appropriate optimizations according to observed changing Runtime program behavior. Coherent cache misses caused by true/false data sharing often limit the scalability of multithreaded applications. This paper shows that COBRA can significantly improve the performance of some applications parallelized with OpenMP, by reducing the aggressiveness of data prefetching and by using exclusive hints for prefetch instructions. For example, we show that COBRA can improve the performance of OpenMP NAS parallel benchmarks up to 68%, with an average of 17.5% on the SGI Altix cc-NUMA system.

James E. Smith - One of the best experts on this subject based on the ideXlab platform.

  • Reducing Startup Time in Co-Designed Virtual Machines
    ACM SIGARCH Computer Architecture News, 2006
    Co-Authors: James E. Smith
    Abstract:

    A Co-Designed Virtual Machine allows designers to implement a processor via a combination of hardware and software. Dynamic Binary translation converts code written for a conventional (legacy) ISA into optimized code for an underlying implementation-specific ISA. Because translation is done dynamically, an important consideration in such systems is the startup time for performing the initial translations. Beginning with a previously proposed co-designed VM that implements the x86 ISA, we study Runtime Binary translation overhead effects. The co-designed x86 virtual machine is based on an adaptive translation system that uses a basic block translator for initial emulation and a superblock translator for hotspot optimization. We analyze and model VM startup performance via simulation. We observe that non-hotspot emulation via basic block translation is the major part of the startup overhead. To reduce startup translation overhead, we follow the co-designed hardware / software philosophy and propose hardware assists to dramatically accelerate basic block translations. By combining hardware assists with balanced translation strategies, the co-designed translation system reduces Runtime overhead significantly and demonstrates very competitive startup performance when compared with conventional processors running a set of Windows application benchmarks.

  • ISCA - Reducing Startup Time in Co-Designed Virtual Machines
    33rd International Symposium on Computer Architecture (ISCA'06), 1
    Co-Authors: James E. Smith
    Abstract:

    A Co-Designed Virtual Machine allows designers to implement a processor via a combination of hardware and software. Dynamic Binary translation converts code written for a conventional (legacy) ISA into optimized code for an underlying implementation-specific ISA. Because translation is done dynamically, an important consideration in such systems is the startup time for performing the initial translations. Beginning with a previously proposed co-designed VM that implements the x86 ISA, we study Runtime Binary translation overhead effects. The co-designed x86 virtual machine is based on an adaptive translation system that uses a basic block translator for initial emulation and a superblock translator for hotspot optimization. We analyze and model VM startup performance via simulation. We observe that non-hotspot emulation via basic block translation is the major part of the startup overhead. To reduce startup translation overhead, we follow the co-designed hardware / software philosophy and propose hardware assists to dramatically accelerate basic block translations. By combining hardware assists with balanced translation strategies, the co-designed translation system reduces Runtime overhead significantly and demonstrates very competitive startup performance when compared with conventional processors running a set of Windows application benchmarks.

Jinpyo Kim - One of the best experts on this subject based on the ideXlab platform.

  • cobra an adaptive Runtime Binary optimization framework for multithreaded applications
    International Conference on Parallel Processing, 2007
    Co-Authors: Jinpyo Kim, Weichung Hsu, Penchung Yew
    Abstract:

    This paper presents COBRA (continuous Binary re-adaptation), a Runtime Binary optimization framework, for multithreaded applications. It is currently implemented on Itanium 2 based SMP and cc-NUMA systems. Using OpenMP NAS parallel benchmark, we show how COBRA can adoptively choose appropriate optimizations according to observed changing Runtime program behavior. Coherent cache misses caused by true/false data sharing often limit the scalability of multithreaded applications. This paper shows that COBRA can significantly improve the performance of some applications parallelized with OpenMP, by reducing the aggressiveness of data prefetching and by using exclusive hints for prefetch instructions. For example, we show that COBRA can improve the performance of OpenMP NAS parallel benchmarks up to 68%, with an average of 17.5% on the SGI Altix cc-NUMA system.

  • ICPP - COBRA: An Adaptive Runtime Binary Optimization Framework for Multithreaded Applications
    2007 International Conference on Parallel Processing (ICPP 2007), 2007
    Co-Authors: Jinpyo Kim, Weichung Hsu, Penchung Yew
    Abstract:

    This paper presents COBRA (continuous Binary re-adaptation), a Runtime Binary optimization framework, for multithreaded applications. It is currently implemented on Itanium 2 based SMP and cc-NUMA systems. Using OpenMP NAS parallel benchmark, we show how COBRA can adoptively choose appropriate optimizations according to observed changing Runtime program behavior. Coherent cache misses caused by true/false data sharing often limit the scalability of multithreaded applications. This paper shows that COBRA can significantly improve the performance of some applications parallelized with OpenMP, by reducing the aggressiveness of data prefetching and by using exclusive hints for prefetch instructions. For example, we show that COBRA can improve the performance of OpenMP NAS parallel benchmarks up to 68%, with an average of 17.5% on the SGI Altix cc-NUMA system.

Weichung Hsu - One of the best experts on this subject based on the ideXlab platform.

  • cobra an adaptive Runtime Binary optimization framework for multithreaded applications
    International Conference on Parallel Processing, 2007
    Co-Authors: Jinpyo Kim, Weichung Hsu, Penchung Yew
    Abstract:

    This paper presents COBRA (continuous Binary re-adaptation), a Runtime Binary optimization framework, for multithreaded applications. It is currently implemented on Itanium 2 based SMP and cc-NUMA systems. Using OpenMP NAS parallel benchmark, we show how COBRA can adoptively choose appropriate optimizations according to observed changing Runtime program behavior. Coherent cache misses caused by true/false data sharing often limit the scalability of multithreaded applications. This paper shows that COBRA can significantly improve the performance of some applications parallelized with OpenMP, by reducing the aggressiveness of data prefetching and by using exclusive hints for prefetch instructions. For example, we show that COBRA can improve the performance of OpenMP NAS parallel benchmarks up to 68%, with an average of 17.5% on the SGI Altix cc-NUMA system.

  • ICPP - COBRA: An Adaptive Runtime Binary Optimization Framework for Multithreaded Applications
    2007 International Conference on Parallel Processing (ICPP 2007), 2007
    Co-Authors: Jinpyo Kim, Weichung Hsu, Penchung Yew
    Abstract:

    This paper presents COBRA (continuous Binary re-adaptation), a Runtime Binary optimization framework, for multithreaded applications. It is currently implemented on Itanium 2 based SMP and cc-NUMA systems. Using OpenMP NAS parallel benchmark, we show how COBRA can adoptively choose appropriate optimizations according to observed changing Runtime program behavior. Coherent cache misses caused by true/false data sharing often limit the scalability of multithreaded applications. This paper shows that COBRA can significantly improve the performance of some applications parallelized with OpenMP, by reducing the aggressiveness of data prefetching and by using exclusive hints for prefetch instructions. For example, we show that COBRA can improve the performance of OpenMP NAS parallel benchmarks up to 68%, with an average of 17.5% on the SGI Altix cc-NUMA system.

Hideharu Amano - One of the best experts on this subject based on the ideXlab platform.

  • a domain specific language and toolchain for opencv Runtime Binary acceleration using gpu
    International Conference on Networking and Computing, 2012
    Co-Authors: Takaaki Miyajima, David B Thomas, Hideharu Amano
    Abstract:

    Computationally intensive applications, such as OpenCV, can be off-loaded to accelerators to reduce execution time. However, developing an accelerated system requires a significant amount of time, requiring the developer to first choose an accelerator and which parts to off-load, then to port and the offloaded kernels to the accelerator using many accelerator-specific tools. In addition to the low-level parallelism of the accelerator, the developer also needs to extract and utilize system-level parallelism found within the application, while making sure that the application still executes correctly. This paper presents Courier, a tool chain and a domain specific language for Runtime Binary Acceleration, designed to simplify many of the steps involved in accelerating an application. The Courier tool chain can extract dataflow from a running software Binary file, explore the off-loaded execution time on an accelerator, and then actually accelerate the original Binary. By utilizing Courier, both expert and non-expert users can easily extract system-level parallelism and decide which part should be off-loaded to accelerators in a mixed software-hardware environment, without special knowledge on the target application source code and accelerator architecture. In a case study an OpenCV application is accelerated by 2.06 times using Courier, without requiring the application source code or any re-compilation of the application.

  • ICNC - A Domain Specific Language and Toolchain for OpenCV Runtime Binary Acceleration Using GPU
    2012 Third International Conference on Networking and Computing, 2012
    Co-Authors: Takaaki Miyajima, David B Thomas, Hideharu Amano
    Abstract:

    Computationally intensive applications, such as OpenCV, can be off-loaded to accelerators to reduce execution time. However, developing an accelerated system requires a significant amount of time, requiring the developer to first choose an accelerator and which parts to off-load, then to port and the offloaded kernels to the accelerator using many accelerator-specific tools. In addition to the low-level parallelism of the accelerator, the developer also needs to extract and utilize system-level parallelism found within the application, while making sure that the application still executes correctly. This paper presents Courier, a tool chain and a domain specific language for Runtime Binary Acceleration, designed to simplify many of the steps involved in accelerating an application. The Courier tool chain can extract dataflow from a running software Binary file, explore the off-loaded execution time on an accelerator, and then actually accelerate the original Binary. By utilizing Courier, both expert and non-expert users can easily extract system-level parallelism and decide which part should be off-loaded to accelerators in a mixed software-hardware environment, without special knowledge on the target application source code and accelerator architecture. In a case study an OpenCV application is accelerated by 2.06 times using Courier, without requiring the application source code or any re-compilation of the application.