processing element - Explore the Science & Experts

The Experts below are selected from a list of 303 Experts worldwide ranked by ideXlab platform

Jakub Kurzak - One of the best experts on this subject based on the ideXlab platform.

fast and small short vector simd matrix multiplication kernels for the synergistic processing element of the cell processor

International Conference on Computational Science, 2008

Co-Authors: Wesley Alvaro, Jakub Kurzak

Abstract:

Matrix multiplication is one of the most common numerical operations, especially in the area of dense linear algebra, where it forms the core of many important algorithms, including solvers of linear systems of equations, least square problems, and singular and eigenvalue computations. The STI CELL processor exceeds the capabilities of any other processor available today in terms of peak single precision, floating point performance. In order to fully exploit the potential of the CELL processor for a wide range of numerical algorithms, fast implementation of the matrix multiplication operation is essential. The crutial component is the matrix multiplication kernel crafted for the short vector Single Instruction Multiple Data architecture of the Synergistic processing element of the CELL processor. In this paper, single precision matrix multiplication kernels are presented implementing the C= Ci¾? A×BToperation and the C= Ci¾? A×Boperation for matrices of size 64 ×64 elements. For the latter case, the performance of 25.55 Gflop/s is reported, or 99.80 percent of the peak, using as little as 5.9 KB of storage for code and auxiliary data structures.

15 days free trial to Access Article
ICCS (1) - Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the Synergistic processing element of the CELL Processor

Computational Science – ICCS 2008, 2008

Co-Authors: Wesley Alvaro, Jakub Kurzak

Abstract:

Matrix multiplication is one of the most common numerical operations, especially in the area of dense linear algebra, where it forms the core of many important algorithms, including solvers of linear systems of equations, least square problems, and singular and eigenvalue computations. The STI CELL processor exceeds the capabilities of any other processor available today in terms of peak single precision, floating point performance. In order to fully exploit the potential of the CELL processor for a wide range of numerical algorithms, fast implementation of the matrix multiplication operation is essential. The crutial component is the matrix multiplication kernel crafted for the short vector Single Instruction Multiple Data architecture of the Synergistic processing element of the CELL processor. In this paper, single precision matrix multiplication kernels are presented implementing the C= Ci¾? A×BToperation and the C= Ci¾? A×Boperation for matrices of size 64 ×64 elements. For the latter case, the performance of 25.55 Gflop/s is reported, or 99.80 percent of the peak, using as little as 5.9 KB of storage for code and auxiliary data structures.

15 days free trial to Access Article

Piotr Dudek - One of the best experts on this subject based on the ideXlab platform.

ISCAS - A processor element for a mixed signal cellular processor array vision chip

2011 IEEE International Symposium of Circuits and Systems (ISCAS), 2011

Co-Authors: Stephen J. Carey, Alexey Lopich, Piotr Dudek

Abstract:

A combined analogue and digital processing element for a pixel-parallel vision chip has been designed in 0.18µm CMOS technology. In addition to 7 analogue registers, each pixel incorporates 14 bits of digital memory. In the analogue domain its processing capabilities include addition, subtraction and squaring, with digital domain NOT and OR operators also available. The processing element has dimensions of 32×32µm and is designed to operate at 10MHz. A test chip has been fabricated.

15 days free trial to Access Article
A processing element for an Analogue SIMD Vision Chip

2003

Co-Authors: Piotr Dudek

Abstract:

This paper describes an analogue processing element (APE) suitable for high-density image sensor/processor array integrated circuits. The design trade- offs between area, power consumption, speed and accuracy are discussed and the architecture of the APE is presented. The design follows a switched-current "analogue microprocessor" approach while the implementation of arithmetic operations is simplified by introducing a register- based current division method. The circuit has been implemented in a 0.35µm single-poly 3-metal layer CMOS technology. The APE measures below 50µm×50µm, operates with a 1 MHz clock and consumes less than 12µW of power (simulation results).

15 days free trial to Access Article

Wesley Alvaro - One of the best experts on this subject based on the ideXlab platform.

fast and small short vector simd matrix multiplication kernels for the synergistic processing element of the cell processor

International Conference on Computational Science, 2008

Co-Authors: Wesley Alvaro, Jakub Kurzak

Abstract:

Matrix multiplication is one of the most common numerical operations, especially in the area of dense linear algebra, where it forms the core of many important algorithms, including solvers of linear systems of equations, least square problems, and singular and eigenvalue computations. The STI CELL processor exceeds the capabilities of any other processor available today in terms of peak single precision, floating point performance. In order to fully exploit the potential of the CELL processor for a wide range of numerical algorithms, fast implementation of the matrix multiplication operation is essential. The crutial component is the matrix multiplication kernel crafted for the short vector Single Instruction Multiple Data architecture of the Synergistic processing element of the CELL processor. In this paper, single precision matrix multiplication kernels are presented implementing the C= Ci¾? A×BToperation and the C= Ci¾? A×Boperation for matrices of size 64 ×64 elements. For the latter case, the performance of 25.55 Gflop/s is reported, or 99.80 percent of the peak, using as little as 5.9 KB of storage for code and auxiliary data structures.

15 days free trial to Access Article
ICCS (1) - Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the Synergistic processing element of the CELL Processor

Computational Science – ICCS 2008, 2008

Co-Authors: Wesley Alvaro, Jakub Kurzak

Abstract:

Matrix multiplication is one of the most common numerical operations, especially in the area of dense linear algebra, where it forms the core of many important algorithms, including solvers of linear systems of equations, least square problems, and singular and eigenvalue computations. The STI CELL processor exceeds the capabilities of any other processor available today in terms of peak single precision, floating point performance. In order to fully exploit the potential of the CELL processor for a wide range of numerical algorithms, fast implementation of the matrix multiplication operation is essential. The crutial component is the matrix multiplication kernel crafted for the short vector Single Instruction Multiple Data architecture of the Synergistic processing element of the CELL processor. In this paper, single precision matrix multiplication kernels are presented implementing the C= Ci¾? A×BToperation and the C= Ci¾? A×Boperation for matrices of size 64 ×64 elements. For the latter case, the performance of 25.55 Gflop/s is reported, or 99.80 percent of the peak, using as little as 5.9 KB of storage for code and auxiliary data structures.

15 days free trial to Access Article

J. Condorodis - One of the best experts on this subject based on the ideXlab platform.

ICASSP - A VLSI design of processing element for reconfigurable systolic architectures based on LNS

ICASSP-88. International Conference on Acoustics Speech and Signal Processing, 1

Co-Authors: George M. Papadourakis, J. Condorodis

Abstract:

The design and development of a processing element (PE) in an orthogonal systolic architecture, using the state of the art in VLSI technology, is presented. The goal was to create a high-speed, high-precision PE which would be adaptive to a highly configurable systolic architecture. In order to achieve the necessary computational throughput, the arithmetic unit of the PE was implemented using the logarithmic number system. The PE is designed to take full advantage of parallel communications, both internally and externally. >

15 days free trial to Access Article

I. Cumming - One of the best experts on this subject based on the ideXlab platform.

ICASSP - A programmable signal processing element designed for an efficient data-driven signal processing architecture

ICASSP '84. IEEE International Conference on Acoustics Speech and Signal Processing, 1

Co-Authors: J. Lim, G. Kalanj, I. Cumming

Abstract:

This paper describes the architecture of a Programmable Signal processing element (PSPE) which has been designed to serve as a building block in a High Thoughput Signal Processor (HTSP). The HTSP has been designed for Synthetic Aperture Radar (SAR) processing applications and itself contains a number of modern design features which allow it to meet the demanding performance requirements of SAR processing. The principle of the HTSP architecture will also be presented in the paper.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic processing element with ideXlab!

Jakub Kurzak - One of the best experts on this subject based on the ideXlab platform.

fast and small short vector simd matrix multiplication kernels for the synergistic processing element of the cell processor

ICCS (1) - Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the Synergistic processing element of the CELL Processor

Piotr Dudek - One of the best experts on this subject based on the ideXlab platform.

ISCAS - A processor element for a mixed signal cellular processor array vision chip

A processing element for an Analogue SIMD Vision Chip

Wesley Alvaro - One of the best experts on this subject based on the ideXlab platform.

fast and small short vector simd matrix multiplication kernels for the synergistic processing element of the cell processor

ICCS (1) - Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the Synergistic processing element of the CELL Processor

J. Condorodis - One of the best experts on this subject based on the ideXlab platform.

ICASSP - A VLSI design of processing element for reconfigurable systolic architectures based on LNS

I. Cumming - One of the best experts on this subject based on the ideXlab platform.

ICASSP - A programmable signal processing element designed for an efficient data-driven signal processing architecture