Runtime Binary

The Experts below are selected from a list of 51 Experts worldwide ranked by ideXlab platform

Penchung Yew - One of the best experts on this subject based on the ideXlab platform.

cobra an adaptive Runtime Binary optimization framework for multithreaded applications

International Conference on Parallel Processing, 2007

Co-Authors: Jinpyo Kim, Weichung Hsu, Penchung Yew

Abstract:

This paper presents COBRA (continuous Binary re-adaptation), a Runtime Binary optimization framework, for multithreaded applications. It is currently implemented on Itanium 2 based SMP and cc-NUMA systems. Using OpenMP NAS parallel benchmark, we show how COBRA can adoptively choose appropriate optimizations according to observed changing Runtime program behavior. Coherent cache misses caused by true/false data sharing often limit the scalability of multithreaded applications. This paper shows that COBRA can significantly improve the performance of some applications parallelized with OpenMP, by reducing the aggressiveness of data prefetching and by using exclusive hints for prefetch instructions. For example, we show that COBRA can improve the performance of OpenMP NAS parallel benchmarks up to 68%, with an average of 17.5% on the SGI Altix cc-NUMA system.

15 days free trial to Access Article
ICPP - COBRA: An Adaptive Runtime Binary Optimization Framework for Multithreaded Applications

2007 International Conference on Parallel Processing (ICPP 2007), 2007

Co-Authors: Jinpyo Kim, Weichung Hsu, Penchung Yew

Abstract:

This paper presents COBRA (continuous Binary re-adaptation), a Runtime Binary optimization framework, for multithreaded applications. It is currently implemented on Itanium 2 based SMP and cc-NUMA systems. Using OpenMP NAS parallel benchmark, we show how COBRA can adoptively choose appropriate optimizations according to observed changing Runtime program behavior. Coherent cache misses caused by true/false data sharing often limit the scalability of multithreaded applications. This paper shows that COBRA can significantly improve the performance of some applications parallelized with OpenMP, by reducing the aggressiveness of data prefetching and by using exclusive hints for prefetch instructions. For example, we show that COBRA can improve the performance of OpenMP NAS parallel benchmarks up to 68%, with an average of 17.5% on the SGI Altix cc-NUMA system.

15 days free trial to Access Article

James E. Smith - One of the best experts on this subject based on the ideXlab platform.

Reducing Startup Time in Co-Designed Virtual Machines

ACM SIGARCH Computer Architecture News, 2006

Co-Authors: James E. Smith

Abstract:

A Co-Designed Virtual Machine allows designers to implement a processor via a combination of hardware and software. Dynamic Binary translation converts code written for a conventional (legacy) ISA into optimized code for an underlying implementation-specific ISA. Because translation is done dynamically, an important consideration in such systems is the startup time for performing the initial translations. Beginning with a previously proposed co-designed VM that implements the x86 ISA, we study Runtime Binary translation overhead effects. The co-designed x86 virtual machine is based on an adaptive translation system that uses a basic block translator for initial emulation and a superblock translator for hotspot optimization. We analyze and model VM startup performance via simulation. We observe that non-hotspot emulation via basic block translation is the major part of the startup overhead. To reduce startup translation overhead, we follow the co-designed hardware / software philosophy and propose hardware assists to dramatically accelerate basic block translations. By combining hardware assists with balanced translation strategies, the co-designed translation system reduces Runtime overhead significantly and demonstrates very competitive startup performance when compared with conventional processors running a set of Windows application benchmarks.

15 days free trial to Access Article
ISCA - Reducing Startup Time in Co-Designed Virtual Machines

33rd International Symposium on Computer Architecture (ISCA'06), 1

Co-Authors: James E. Smith

Abstract:

A Co-Designed Virtual Machine allows designers to implement a processor via a combination of hardware and software. Dynamic Binary translation converts code written for a conventional (legacy) ISA into optimized code for an underlying implementation-specific ISA. Because translation is done dynamically, an important consideration in such systems is the startup time for performing the initial translations. Beginning with a previously proposed co-designed VM that implements the x86 ISA, we study Runtime Binary translation overhead effects. The co-designed x86 virtual machine is based on an adaptive translation system that uses a basic block translator for initial emulation and a superblock translator for hotspot optimization. We analyze and model VM startup performance via simulation. We observe that non-hotspot emulation via basic block translation is the major part of the startup overhead. To reduce startup translation overhead, we follow the co-designed hardware / software philosophy and propose hardware assists to dramatically accelerate basic block translations. By combining hardware assists with balanced translation strategies, the co-designed translation system reduces Runtime overhead significantly and demonstrates very competitive startup performance when compared with conventional processors running a set of Windows application benchmarks.

15 days free trial to Access Article

Jinpyo Kim - One of the best experts on this subject based on the ideXlab platform.

cobra an adaptive Runtime Binary optimization framework for multithreaded applications

International Conference on Parallel Processing, 2007

Co-Authors: Jinpyo Kim, Weichung Hsu, Penchung Yew

Abstract:

This paper presents COBRA (continuous Binary re-adaptation), a Runtime Binary optimization framework, for multithreaded applications. It is currently implemented on Itanium 2 based SMP and cc-NUMA systems. Using OpenMP NAS parallel benchmark, we show how COBRA can adoptively choose appropriate optimizations according to observed changing Runtime program behavior. Coherent cache misses caused by true/false data sharing often limit the scalability of multithreaded applications. This paper shows that COBRA can significantly improve the performance of some applications parallelized with OpenMP, by reducing the aggressiveness of data prefetching and by using exclusive hints for prefetch instructions. For example, we show that COBRA can improve the performance of OpenMP NAS parallel benchmarks up to 68%, with an average of 17.5% on the SGI Altix cc-NUMA system.

15 days free trial to Access Article
ICPP - COBRA: An Adaptive Runtime Binary Optimization Framework for Multithreaded Applications

2007 International Conference on Parallel Processing (ICPP 2007), 2007

Co-Authors: Jinpyo Kim, Weichung Hsu, Penchung Yew

Abstract:

This paper presents COBRA (continuous Binary re-adaptation), a Runtime Binary optimization framework, for multithreaded applications. It is currently implemented on Itanium 2 based SMP and cc-NUMA systems. Using OpenMP NAS parallel benchmark, we show how COBRA can adoptively choose appropriate optimizations according to observed changing Runtime program behavior. Coherent cache misses caused by true/false data sharing often limit the scalability of multithreaded applications. This paper shows that COBRA can significantly improve the performance of some applications parallelized with OpenMP, by reducing the aggressiveness of data prefetching and by using exclusive hints for prefetch instructions. For example, we show that COBRA can improve the performance of OpenMP NAS parallel benchmarks up to 68%, with an average of 17.5% on the SGI Altix cc-NUMA system.

15 days free trial to Access Article

Weichung Hsu - One of the best experts on this subject based on the ideXlab platform.

cobra an adaptive Runtime Binary optimization framework for multithreaded applications

International Conference on Parallel Processing, 2007

Co-Authors: Jinpyo Kim, Weichung Hsu, Penchung Yew

Abstract:

This paper presents COBRA (continuous Binary re-adaptation), a Runtime Binary optimization framework, for multithreaded applications. It is currently implemented on Itanium 2 based SMP and cc-NUMA systems. Using OpenMP NAS parallel benchmark, we show how COBRA can adoptively choose appropriate optimizations according to observed changing Runtime program behavior. Coherent cache misses caused by true/false data sharing often limit the scalability of multithreaded applications. This paper shows that COBRA can significantly improve the performance of some applications parallelized with OpenMP, by reducing the aggressiveness of data prefetching and by using exclusive hints for prefetch instructions. For example, we show that COBRA can improve the performance of OpenMP NAS parallel benchmarks up to 68%, with an average of 17.5% on the SGI Altix cc-NUMA system.

15 days free trial to Access Article
ICPP - COBRA: An Adaptive Runtime Binary Optimization Framework for Multithreaded Applications

2007 International Conference on Parallel Processing (ICPP 2007), 2007

Co-Authors: Jinpyo Kim, Weichung Hsu, Penchung Yew

Abstract:

This paper presents COBRA (continuous Binary re-adaptation), a Runtime Binary optimization framework, for multithreaded applications. It is currently implemented on Itanium 2 based SMP and cc-NUMA systems. Using OpenMP NAS parallel benchmark, we show how COBRA can adoptively choose appropriate optimizations according to observed changing Runtime program behavior. Coherent cache misses caused by true/false data sharing often limit the scalability of multithreaded applications. This paper shows that COBRA can significantly improve the performance of some applications parallelized with OpenMP, by reducing the aggressiveness of data prefetching and by using exclusive hints for prefetch instructions. For example, we show that COBRA can improve the performance of OpenMP NAS parallel benchmarks up to 68%, with an average of 17.5% on the SGI Altix cc-NUMA system.

15 days free trial to Access Article

Hideharu Amano - One of the best experts on this subject based on the ideXlab platform.

a domain specific language and toolchain for opencv Runtime Binary acceleration using gpu

International Conference on Networking and Computing, 2012

Co-Authors: Takaaki Miyajima, David B Thomas, Hideharu Amano

Abstract:

Computationally intensive applications, such as OpenCV, can be off-loaded to accelerators to reduce execution time. However, developing an accelerated system requires a significant amount of time, requiring the developer to first choose an accelerator and which parts to off-load, then to port and the offloaded kernels to the accelerator using many accelerator-specific tools. In addition to the low-level parallelism of the accelerator, the developer also needs to extract and utilize system-level parallelism found within the application, while making sure that the application still executes correctly. This paper presents Courier, a tool chain and a domain specific language for Runtime Binary Acceleration, designed to simplify many of the steps involved in accelerating an application. The Courier tool chain can extract dataflow from a running software Binary file, explore the off-loaded execution time on an accelerator, and then actually accelerate the original Binary. By utilizing Courier, both expert and non-expert users can easily extract system-level parallelism and decide which part should be off-loaded to accelerators in a mixed software-hardware environment, without special knowledge on the target application source code and accelerator architecture. In a case study an OpenCV application is accelerated by 2.06 times using Courier, without requiring the application source code or any re-compilation of the application.

15 days free trial to Access Article
ICNC - A Domain Specific Language and Toolchain for OpenCV Runtime Binary Acceleration Using GPU

2012 Third International Conference on Networking and Computing, 2012

Co-Authors: Takaaki Miyajima, David B Thomas, Hideharu Amano

Abstract:

Computationally intensive applications, such as OpenCV, can be off-loaded to accelerators to reduce execution time. However, developing an accelerated system requires a significant amount of time, requiring the developer to first choose an accelerator and which parts to off-load, then to port and the offloaded kernels to the accelerator using many accelerator-specific tools. In addition to the low-level parallelism of the accelerator, the developer also needs to extract and utilize system-level parallelism found within the application, while making sure that the application still executes correctly. This paper presents Courier, a tool chain and a domain specific language for Runtime Binary Acceleration, designed to simplify many of the steps involved in accelerating an application. The Courier tool chain can extract dataflow from a running software Binary file, explore the off-loaded execution time on an accelerator, and then actually accelerate the original Binary. By utilizing Courier, both expert and non-expert users can easily extract system-level parallelism and decide which part should be off-loaded to accelerators in a mixed software-hardware environment, without special knowledge on the target application source code and accelerator architecture. In a case study an OpenCV application is accelerated by 2.06 times using Courier, without requiring the application source code or any re-compilation of the application.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Penchung Yew - One of the best experts on this subject based on the ideXlab platform.

cobra an adaptive Runtime Binary optimization framework for multithreaded applications

ICPP - COBRA: An Adaptive Runtime Binary Optimization Framework for Multithreaded Applications

James E. Smith - One of the best experts on this subject based on the ideXlab platform.

Reducing Startup Time in Co-Designed Virtual Machines

ISCA - Reducing Startup Time in Co-Designed Virtual Machines

Jinpyo Kim - One of the best experts on this subject based on the ideXlab platform.

cobra an adaptive Runtime Binary optimization framework for multithreaded applications

ICPP - COBRA: An Adaptive Runtime Binary Optimization Framework for Multithreaded Applications

Weichung Hsu - One of the best experts on this subject based on the ideXlab platform.

cobra an adaptive Runtime Binary optimization framework for multithreaded applications

ICPP - COBRA: An Adaptive Runtime Binary Optimization Framework for Multithreaded Applications

Hideharu Amano - One of the best experts on this subject based on the ideXlab platform.

a domain specific language and toolchain for opencv Runtime Binary acceleration using gpu

ICNC - A Domain Specific Language and Toolchain for OpenCV Runtime Binary Acceleration Using GPU