processing element

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 303 Experts worldwide ranked by ideXlab platform

Jakub Kurzak - One of the best experts on this subject based on the ideXlab platform.

  • fast and small short vector simd matrix multiplication kernels for the synergistic processing element of the cell processor
    International Conference on Computational Science, 2008
    Co-Authors: Wesley Alvaro, Jakub Kurzak
    Abstract:

    Matrix multiplication is one of the most common numerical operations, especially in the area of dense linear algebra, where it forms the core of many important algorithms, including solvers of linear systems of equations, least square problems, and singular and eigenvalue computations. The STI CELL processor exceeds the capabilities of any other processor available today in terms of peak single precision, floating point performance. In order to fully exploit the potential of the CELL processor for a wide range of numerical algorithms, fast implementation of the matrix multiplication operation is essential. The crutial component is the matrix multiplication kernel crafted for the short vector Single Instruction Multiple Data architecture of the Synergistic processing element of the CELL processor. In this paper, single precision matrix multiplication kernels are presented implementing the C= Ci¾? A×BToperation and the C= Ci¾? A×Boperation for matrices of size 64 ×64 elements. For the latter case, the performance of 25.55 Gflop/s is reported, or 99.80 percent of the peak, using as little as 5.9 KB of storage for code and auxiliary data structures.

  • ICCS (1) - Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the Synergistic processing element of the CELL Processor
    Computational Science – ICCS 2008, 2008
    Co-Authors: Wesley Alvaro, Jakub Kurzak
    Abstract:

    Matrix multiplication is one of the most common numerical operations, especially in the area of dense linear algebra, where it forms the core of many important algorithms, including solvers of linear systems of equations, least square problems, and singular and eigenvalue computations. The STI CELL processor exceeds the capabilities of any other processor available today in terms of peak single precision, floating point performance. In order to fully exploit the potential of the CELL processor for a wide range of numerical algorithms, fast implementation of the matrix multiplication operation is essential. The crutial component is the matrix multiplication kernel crafted for the short vector Single Instruction Multiple Data architecture of the Synergistic processing element of the CELL processor. In this paper, single precision matrix multiplication kernels are presented implementing the C= Ci¾? A×BToperation and the C= Ci¾? A×Boperation for matrices of size 64 ×64 elements. For the latter case, the performance of 25.55 Gflop/s is reported, or 99.80 percent of the peak, using as little as 5.9 KB of storage for code and auxiliary data structures.

Piotr Dudek - One of the best experts on this subject based on the ideXlab platform.

  • ISCAS - A processor element for a mixed signal cellular processor array vision chip
    2011 IEEE International Symposium of Circuits and Systems (ISCAS), 2011
    Co-Authors: Stephen J. Carey, Alexey Lopich, Piotr Dudek
    Abstract:

    A combined analogue and digital processing element for a pixel-parallel vision chip has been designed in 0.18µm CMOS technology. In addition to 7 analogue registers, each pixel incorporates 14 bits of digital memory. In the analogue domain its processing capabilities include addition, subtraction and squaring, with digital domain NOT and OR operators also available. The processing element has dimensions of 32×32µm and is designed to operate at 10MHz. A test chip has been fabricated.

  • A processing element for an Analogue SIMD Vision Chip
    2003
    Co-Authors: Piotr Dudek
    Abstract:

    This paper describes an analogue processing element (APE) suitable for high-density image sensor/processor array integrated circuits. The design trade- offs between area, power consumption, speed and accuracy are discussed and the architecture of the APE is presented. The design follows a switched-current "analogue microprocessor" approach while the implementation of arithmetic operations is simplified by introducing a register- based current division method. The circuit has been implemented in a 0.35µm single-poly 3-metal layer CMOS technology. The APE measures below 50µm×50µm, operates with a 1 MHz clock and consumes less than 12µW of power (simulation results).

Wesley Alvaro - One of the best experts on this subject based on the ideXlab platform.

  • fast and small short vector simd matrix multiplication kernels for the synergistic processing element of the cell processor
    International Conference on Computational Science, 2008
    Co-Authors: Wesley Alvaro, Jakub Kurzak
    Abstract:

    Matrix multiplication is one of the most common numerical operations, especially in the area of dense linear algebra, where it forms the core of many important algorithms, including solvers of linear systems of equations, least square problems, and singular and eigenvalue computations. The STI CELL processor exceeds the capabilities of any other processor available today in terms of peak single precision, floating point performance. In order to fully exploit the potential of the CELL processor for a wide range of numerical algorithms, fast implementation of the matrix multiplication operation is essential. The crutial component is the matrix multiplication kernel crafted for the short vector Single Instruction Multiple Data architecture of the Synergistic processing element of the CELL processor. In this paper, single precision matrix multiplication kernels are presented implementing the C= Ci¾? A×BToperation and the C= Ci¾? A×Boperation for matrices of size 64 ×64 elements. For the latter case, the performance of 25.55 Gflop/s is reported, or 99.80 percent of the peak, using as little as 5.9 KB of storage for code and auxiliary data structures.

  • ICCS (1) - Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the Synergistic processing element of the CELL Processor
    Computational Science – ICCS 2008, 2008
    Co-Authors: Wesley Alvaro, Jakub Kurzak
    Abstract:

    Matrix multiplication is one of the most common numerical operations, especially in the area of dense linear algebra, where it forms the core of many important algorithms, including solvers of linear systems of equations, least square problems, and singular and eigenvalue computations. The STI CELL processor exceeds the capabilities of any other processor available today in terms of peak single precision, floating point performance. In order to fully exploit the potential of the CELL processor for a wide range of numerical algorithms, fast implementation of the matrix multiplication operation is essential. The crutial component is the matrix multiplication kernel crafted for the short vector Single Instruction Multiple Data architecture of the Synergistic processing element of the CELL processor. In this paper, single precision matrix multiplication kernels are presented implementing the C= Ci¾? A×BToperation and the C= Ci¾? A×Boperation for matrices of size 64 ×64 elements. For the latter case, the performance of 25.55 Gflop/s is reported, or 99.80 percent of the peak, using as little as 5.9 KB of storage for code and auxiliary data structures.

J. Condorodis - One of the best experts on this subject based on the ideXlab platform.

  • ICASSP - A VLSI design of processing element for reconfigurable systolic architectures based on LNS
    ICASSP-88. International Conference on Acoustics Speech and Signal Processing, 1
    Co-Authors: George M. Papadourakis, J. Condorodis
    Abstract:

    The design and development of a processing element (PE) in an orthogonal systolic architecture, using the state of the art in VLSI technology, is presented. The goal was to create a high-speed, high-precision PE which would be adaptive to a highly configurable systolic architecture. In order to achieve the necessary computational throughput, the arithmetic unit of the PE was implemented using the logarithmic number system. The PE is designed to take full advantage of parallel communications, both internally and externally. >

I. Cumming - One of the best experts on this subject based on the ideXlab platform.