Target Processor

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 192 Experts worldwide ranked by ideXlab platform

Sang Lyul Min - One of the best experts on this subject based on the ideXlab platform.

  • Efficient Worst Case Timing Analysis of Data Caching
    IEEE, 1996
    Co-Authors: Sung-kwan Kim, Sang Lyul Min
    Abstract:

    Recent progress in worst case timing analysis of programs has made it possible to perform accurate timing analysis of pipelined execution and instruction caching, which is necessary when a RISC Processor is used as the Target Processor of a real-time system. However, there has not been much progress in worst case timing analysis of data caching. This is mainly due to load/store instructions that reference multiple memory locations such as those used to implement array and pointer-based references. These load/store instructions are called dynamic load/store instructions and most current analysis techniques take a very conservative approach to their timing analysis. In many cases, it is assumed that each of the references from a dynamic load/store instruction will miss in the cache and replace a cache block that would otherwise lead to a cache hit. This conservative approach results in severe overestimation of the worst case execution time (WCET). This paper proposes two techniques to mi..

  • Efficient Worst Case Timing Analysis of Data Caching
    IEEE, 1996
    Co-Authors: Sung-kwan Kim, Sang Lyul Min
    Abstract:

    Recent progress in worst case timing analysis of programs has made it possible to perform accurate timing analysis of pipelined execution and instruction caching, which is necessary when a RISC Processor is used as the Target Processor of a real-time system. However, there has not been much progress in worst case timing analysis of data caching. This is mainly due to load/store instructions that reference multiple memory locations such as those used to implement array and pointer-based references. These load/store instructions are called dynamic load/store instructions and most current analysis techniques take a very conservative approach to their timing analysis. In many cases, it is assumed that each of the references from a dynamic load/store instruction will miss in the cache and replace a cache block that would otherwise lead to a cache hit. This conservative approach results in severe overestimation of the worst case execution time (WCET). This paper proposes two techniques to minimize the WCET overestimation due to such load/store instructions. The first technique uses a global data flow analysis technique to reduce the number of load/store instructions that are misclassified as dynamic load/store instructions. The second technique utilizes data dependence analysis to minimize the adverse impact of dynamic load/store instructions. This paper also compares the WCET bounds of simple benchmark programs that are predicted with and without applying the proposed techniques. The results show that they significantly (up to 20%) improve the accuracy of WCET estimation especially for programs with a large number of references from dynamic load/store instructions

W C Newman - One of the best experts on this subject based on the ideXlab platform.

  • direct synthesis of optimized dsp assembly code from signal flow block diagrams
    International Conference on Acoustics Speech and Signal Processing, 1992
    Co-Authors: D B Powell, Edward A Lee, W C Newman
    Abstract:

    Block diagrams with signal flow semantics have proven their utility in system simulation and algorithm development. They can also be used as high-level languages for real-time system implementation and design. An approach to synthesizing optimized assembly code for programmable DSPs from block diagrams is described. The extensible block library defines code segments in a meta-assembly language that uses the syntax of the assembly code of the Target Processor, but symbolically references registers and memory. An optimizing code generator compiles these segments together, allocates registers and memory, and inserts data movement instructions as needed to produce optimized assembly code. In exchange for Target-Processor dependence in both the code generator and the block library, the system produces assembly code that can closely match the efficiency of hand-written code. >

Enrique S Quintanaorti - One of the best experts on this subject based on the ideXlab platform.

  • architecture aware configuration and scheduling of matrix multiplication on asymmetric multicore Processors
    Cluster Computing, 2016
    Co-Authors: Sandra Catalan, Francisco D Igual, Rafael Mayo, Rafael Rodriguezsanchez, Enrique S Quintanaorti
    Abstract:

    Asymmetric multicore Processors have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications on clusters of commodity systems-on-chip. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware configuration as well as asymmetric-static and dynamic scheduling strategies that carefully tune and distribute the operation's micro-kernels among the big and LITTLE cores of the Target Processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy efficiency.

  • architecture aware configuration and scheduling of matrix multiplication on asymmetric multicore Processors
    arXiv: Performance, 2015
    Co-Authors: Sandra Catalan, Francisco D Igual, Rafael Mayo, Rafael Rodriguezsanchez, Enrique S Quintanaorti
    Abstract:

    Asymmetric multicore Processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware configuration as well as asymmetric--static and dynamic scheduling strategies that carefully tune and distribute the operation's micro-kernels among the big and LITTLE cores of the Target Processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy efficiency.

Sung-kwan Kim - One of the best experts on this subject based on the ideXlab platform.

  • Efficient Worst Case Timing Analysis of Data Caching
    IEEE, 1996
    Co-Authors: Sung-kwan Kim, Sang Lyul Min
    Abstract:

    Recent progress in worst case timing analysis of programs has made it possible to perform accurate timing analysis of pipelined execution and instruction caching, which is necessary when a RISC Processor is used as the Target Processor of a real-time system. However, there has not been much progress in worst case timing analysis of data caching. This is mainly due to load/store instructions that reference multiple memory locations such as those used to implement array and pointer-based references. These load/store instructions are called dynamic load/store instructions and most current analysis techniques take a very conservative approach to their timing analysis. In many cases, it is assumed that each of the references from a dynamic load/store instruction will miss in the cache and replace a cache block that would otherwise lead to a cache hit. This conservative approach results in severe overestimation of the worst case execution time (WCET). This paper proposes two techniques to mi..

  • Efficient Worst Case Timing Analysis of Data Caching
    IEEE, 1996
    Co-Authors: Sung-kwan Kim, Sang Lyul Min
    Abstract:

    Recent progress in worst case timing analysis of programs has made it possible to perform accurate timing analysis of pipelined execution and instruction caching, which is necessary when a RISC Processor is used as the Target Processor of a real-time system. However, there has not been much progress in worst case timing analysis of data caching. This is mainly due to load/store instructions that reference multiple memory locations such as those used to implement array and pointer-based references. These load/store instructions are called dynamic load/store instructions and most current analysis techniques take a very conservative approach to their timing analysis. In many cases, it is assumed that each of the references from a dynamic load/store instruction will miss in the cache and replace a cache block that would otherwise lead to a cache hit. This conservative approach results in severe overestimation of the worst case execution time (WCET). This paper proposes two techniques to minimize the WCET overestimation due to such load/store instructions. The first technique uses a global data flow analysis technique to reduce the number of load/store instructions that are misclassified as dynamic load/store instructions. The second technique utilizes data dependence analysis to minimize the adverse impact of dynamic load/store instructions. This paper also compares the WCET bounds of simple benchmark programs that are predicted with and without applying the proposed techniques. The results show that they significantly (up to 20%) improve the accuracy of WCET estimation especially for programs with a large number of references from dynamic load/store instructions

D B Powell - One of the best experts on this subject based on the ideXlab platform.

  • direct synthesis of optimized dsp assembly code from signal flow block diagrams
    International Conference on Acoustics Speech and Signal Processing, 1992
    Co-Authors: D B Powell, Edward A Lee, W C Newman
    Abstract:

    Block diagrams with signal flow semantics have proven their utility in system simulation and algorithm development. They can also be used as high-level languages for real-time system implementation and design. An approach to synthesizing optimized assembly code for programmable DSPs from block diagrams is described. The extensible block library defines code segments in a meta-assembly language that uses the syntax of the assembly code of the Target Processor, but symbolically references registers and memory. An optimizing code generator compiles these segments together, allocates registers and memory, and inserts data movement instructions as needed to produce optimized assembly code. In exchange for Target-Processor dependence in both the code generator and the block library, the system produces assembly code that can closely match the efficiency of hand-written code. >