Target Processor

The Experts below are selected from a list of 192 Experts worldwide ranked by ideXlab platform

Sang Lyul Min - One of the best experts on this subject based on the ideXlab platform.

Efficient Worst Case Timing Analysis of Data Caching

IEEE, 1996

Co-Authors: Sung-kwan Kim, Sang Lyul Min

Abstract:

Recent progress in worst case timing analysis of programs has made it possible to perform accurate timing analysis of pipelined execution and instruction caching, which is necessary when a RISC Processor is used as the Target Processor of a real-time system. However, there has not been much progress in worst case timing analysis of data caching. This is mainly due to load/store instructions that reference multiple memory locations such as those used to implement array and pointer-based references. These load/store instructions are called dynamic load/store instructions and most current analysis techniques take a very conservative approach to their timing analysis. In many cases, it is assumed that each of the references from a dynamic load/store instruction will miss in the cache and replace a cache block that would otherwise lead to a cache hit. This conservative approach results in severe overestimation of the worst case execution time (WCET). This paper proposes two techniques to mi..

15 days free trial to Access Article
Efficient Worst Case Timing Analysis of Data Caching

IEEE, 1996

Co-Authors: Sung-kwan Kim, Sang Lyul Min

Abstract:

Recent progress in worst case timing analysis of programs has made it possible to perform accurate timing analysis of pipelined execution and instruction caching, which is necessary when a RISC Processor is used as the Target Processor of a real-time system. However, there has not been much progress in worst case timing analysis of data caching. This is mainly due to load/store instructions that reference multiple memory locations such as those used to implement array and pointer-based references. These load/store instructions are called dynamic load/store instructions and most current analysis techniques take a very conservative approach to their timing analysis. In many cases, it is assumed that each of the references from a dynamic load/store instruction will miss in the cache and replace a cache block that would otherwise lead to a cache hit. This conservative approach results in severe overestimation of the worst case execution time (WCET). This paper proposes two techniques to minimize the WCET overestimation due to such load/store instructions. The first technique uses a global data flow analysis technique to reduce the number of load/store instructions that are misclassified as dynamic load/store instructions. The second technique utilizes data dependence analysis to minimize the adverse impact of dynamic load/store instructions. This paper also compares the WCET bounds of simple benchmark programs that are predicted with and without applying the proposed techniques. The results show that they significantly (up to 20%) improve the accuracy of WCET estimation especially for programs with a large number of references from dynamic load/store instructions

15 days free trial to Access Article

W C Newman - One of the best experts on this subject based on the ideXlab platform.

direct synthesis of optimized dsp assembly code from signal flow block diagrams

International Conference on Acoustics Speech and Signal Processing, 1992

Co-Authors: D B Powell, Edward A Lee, W C Newman

Abstract:

Block diagrams with signal flow semantics have proven their utility in system simulation and algorithm development. They can also be used as high-level languages for real-time system implementation and design. An approach to synthesizing optimized assembly code for programmable DSPs from block diagrams is described. The extensible block library defines code segments in a meta-assembly language that uses the syntax of the assembly code of the Target Processor, but symbolically references registers and memory. An optimizing code generator compiles these segments together, allocates registers and memory, and inserts data movement instructions as needed to produce optimized assembly code. In exchange for Target-Processor dependence in both the code generator and the block library, the system produces assembly code that can closely match the efficiency of hand-written code. >

15 days free trial to Access Article

Enrique S Quintanaorti - One of the best experts on this subject based on the ideXlab platform.

architecture aware configuration and scheduling of matrix multiplication on asymmetric multicore Processors

Cluster Computing, 2016

Co-Authors: Sandra Catalan, Francisco D Igual, Rafael Mayo, Rafael Rodriguezsanchez, Enrique S Quintanaorti

Abstract:

Asymmetric multicore Processors have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications on clusters of commodity systems-on-chip. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware configuration as well as asymmetric-static and dynamic scheduling strategies that carefully tune and distribute the operation's micro-kernels among the big and LITTLE cores of the Target Processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy efficiency.

15 days free trial to Access Article
architecture aware configuration and scheduling of matrix multiplication on asymmetric multicore Processors

arXiv: Performance, 2015

Co-Authors: Sandra Catalan, Francisco D Igual, Rafael Mayo, Rafael Rodriguezsanchez, Enrique S Quintanaorti

Abstract:

Asymmetric multicore Processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware configuration as well as asymmetric--static and dynamic scheduling strategies that carefully tune and distribute the operation's micro-kernels among the big and LITTLE cores of the Target Processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy efficiency.

15 days free trial to Access Article

Sung-kwan Kim - One of the best experts on this subject based on the ideXlab platform.

Efficient Worst Case Timing Analysis of Data Caching

IEEE, 1996

Co-Authors: Sung-kwan Kim, Sang Lyul Min

Abstract:

Recent progress in worst case timing analysis of programs has made it possible to perform accurate timing analysis of pipelined execution and instruction caching, which is necessary when a RISC Processor is used as the Target Processor of a real-time system. However, there has not been much progress in worst case timing analysis of data caching. This is mainly due to load/store instructions that reference multiple memory locations such as those used to implement array and pointer-based references. These load/store instructions are called dynamic load/store instructions and most current analysis techniques take a very conservative approach to their timing analysis. In many cases, it is assumed that each of the references from a dynamic load/store instruction will miss in the cache and replace a cache block that would otherwise lead to a cache hit. This conservative approach results in severe overestimation of the worst case execution time (WCET). This paper proposes two techniques to mi..

15 days free trial to Access Article
Efficient Worst Case Timing Analysis of Data Caching

IEEE, 1996

Co-Authors: Sung-kwan Kim, Sang Lyul Min

Abstract:

Recent progress in worst case timing analysis of programs has made it possible to perform accurate timing analysis of pipelined execution and instruction caching, which is necessary when a RISC Processor is used as the Target Processor of a real-time system. However, there has not been much progress in worst case timing analysis of data caching. This is mainly due to load/store instructions that reference multiple memory locations such as those used to implement array and pointer-based references. These load/store instructions are called dynamic load/store instructions and most current analysis techniques take a very conservative approach to their timing analysis. In many cases, it is assumed that each of the references from a dynamic load/store instruction will miss in the cache and replace a cache block that would otherwise lead to a cache hit. This conservative approach results in severe overestimation of the worst case execution time (WCET). This paper proposes two techniques to minimize the WCET overestimation due to such load/store instructions. The first technique uses a global data flow analysis technique to reduce the number of load/store instructions that are misclassified as dynamic load/store instructions. The second technique utilizes data dependence analysis to minimize the adverse impact of dynamic load/store instructions. This paper also compares the WCET bounds of simple benchmark programs that are predicted with and without applying the proposed techniques. The results show that they significantly (up to 20%) improve the accuracy of WCET estimation especially for programs with a large number of references from dynamic load/store instructions

15 days free trial to Access Article

D B Powell - One of the best experts on this subject based on the ideXlab platform.

direct synthesis of optimized dsp assembly code from signal flow block diagrams

International Conference on Acoustics Speech and Signal Processing, 1992

Co-Authors: D B Powell, Edward A Lee, W C Newman

Abstract:

Block diagrams with signal flow semantics have proven their utility in system simulation and algorithm development. They can also be used as high-level languages for real-time system implementation and design. An approach to synthesizing optimized assembly code for programmable DSPs from block diagrams is described. The extensible block library defines code segments in a meta-assembly language that uses the syntax of the assembly code of the Target Processor, but symbolically references registers and memory. An optimizing code generator compiles these segments together, allocates registers and memory, and inserts data movement instructions as needed to produce optimized assembly code. In exchange for Target-Processor dependence in both the code generator and the block library, the system produces assembly code that can closely match the efficiency of hand-written code. >

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Sang Lyul Min - One of the best experts on this subject based on the ideXlab platform.

Efficient Worst Case Timing Analysis of Data Caching

Efficient Worst Case Timing Analysis of Data Caching

W C Newman - One of the best experts on this subject based on the ideXlab platform.

direct synthesis of optimized dsp assembly code from signal flow block diagrams

Enrique S Quintanaorti - One of the best experts on this subject based on the ideXlab platform.

architecture aware configuration and scheduling of matrix multiplication on asymmetric multicore Processors

architecture aware configuration and scheduling of matrix multiplication on asymmetric multicore Processors

Sung-kwan Kim - One of the best experts on this subject based on the ideXlab platform.

Efficient Worst Case Timing Analysis of Data Caching

Efficient Worst Case Timing Analysis of Data Caching

D B Powell - One of the best experts on this subject based on the ideXlab platform.

direct synthesis of optimized dsp assembly code from signal flow block diagrams