Systolic Arrays

The Experts below are selected from a list of 11226 Experts worldwide ranked by ideXlab platform

Songchun Zhu - One of the best experts on this subject based on the ideXlab platform.

sparse winograd convolutional neural networks on small scale Systolic Arrays

Field Programmable Gate Arrays, 2019

Co-Authors: Feng Shi, Yuhe Gao, Benjamin Kuschner, Songchun Zhu

Abstract:

The reconfigurability, energy-efficiency, and massive parallelism on FPGAs make them one of the best choices for implementing efficient deep learning accelerators. However, state-of-art implementations seldom consider the balance between high throughput of computation power and the ability of the memory subsystem to support it. In this paper, we implement a framework on FPGA by combining the sparse Winograd convolution, clusters of small-scale Systolic Arrays, and a tailored recursive Z-Morton memory layout design. We also provide an analytical model analysis for the general Winograd convolution algorithm as a design reference. Experimental results on various CNN models show that it achieves very high computation resource utilization, 20x~30x energy efficiency, and more than 5x speedup compared with the dense implementation.

15 days free trial to Access Article
sparse winograd convolutional neural networks on small scale Systolic Arrays

arXiv: Distributed Parallel and Cluster Computing, 2018

Co-Authors: Feng Shi, Yuhe Gao, Benjamin Kuschner, Songchun Zhu

Abstract:

The reconfigurability, energy-efficiency, and massive parallelism on FPGAs make them one of the best choices for implementing efficient deep learning accelerators. However, state-of-art implementations seldom consider the balance between high throughput of computation power and the ability of the memory subsystem to support it. In this paper, we implement an accelerator on FPGA by combining the sparse Winograd convolution, clusters of small-scale Systolic Arrays, and a tailored memory layout design. We also provide an analytical model analysis for the general Winograd convolution algorithm as a design reference. Experimental results on VGG16 show that it achieves very high computational resource utilization, 20x ~ 30x energy efficiency, and more than 5x speedup compared with the dense implementation.

15 days free trial to Access Article

Mile K. Stojcev - One of the best experts on this subject based on the ideXlab platform.

design of linear Systolic Arrays for matrix multiplication

Advances in Electrical and Computer Engineering, 2014

Co-Authors: Emina I. Milovanovic, Igor Z. Milovanovic, Mile K. Stojcev, Tatjana R Nikolic

Abstract:

This paper presents architecture for matrix multiplication optimized to be integrated as an accelerator unit to a host computer. Two linear Systolic Arrays with unidirectional data flo ...

15 days free trial to Access Article
Orthogonal fault-tolerant Systolic Arrays for matrix multiplication

Microelectronics Reliability, 2011

Co-Authors: Igor Z. Milovanovic, Emina I. Milovanovic, Mile K. Stojcev, M. P. Bekakos

Abstract:

Abstract A systematic approach for designing one class of fault-tolerant Systolic Arrays (FTSAs) with orthogonal interconnects and unidirectional data flow, Orthogonal Unidirectional Systolic Array (OUSA), for multiplication of rectangular matrices is presented in this paper. The method employs space-time redundancy to achieve fault-tolerance. By conducting proposed systematic design procedure, four different Systolic Arrays of OUSA type are obtained. All the Arrays can tolerate single transient errors and majority of multiple errors with high probability. In order to provide high bandwidth in data access, a special hardware called address generator unit, was designed. Hardware complexity and performance gains achieved at higher (system, algorithm and architecture) design levels were analyzed. The obtained results show that with n2 + 2n processing elements the total execution time of the fault-tolerant algorithm is 6n + 3 time units, the hardware overhead due to involving fault-tolerance is in the range from 6.25% down to 0.8%, while time overhead is 50%. In addition, by involving hardware implemented address generation unit we reduce the total execution time of the algorithm almost five times, compared to software address calculations.

15 days free trial to Access Article
Hexagonal Systolic Arrays for matrix multiplication

2001

Co-Authors: M. P. Bekakos, Emina I. Milovanovic, I. Ž. Milovanović, T. I. Tokić, Mile K. Stojcev

Abstract:

We consider the problem of matrix multiplication on hexagonal Systolic Arrays (SA). We begin with the description of the procedure for Systolic array designing which is based on data dependency and space-time mapping of the nested loop algorithms. Then we introduce some performance measures which are used throughout the chapter for comparison of various SAs. We proceed with modification of the standard design procedure which enables synthesis of Systolic Arrays with the optimal number of processing elements (PE) for a given problem size and minimal execution time for a given number of PEs. Then we analyse and compare different hexagonal Arrays. Further, we show how execution time of matrix multiplication algorithm can be reduced if the number of PEs is increased with respect to the optimal one. Finally, we address the problem of fault-tolerant matrix multiplication on hexagonal Arrays.

15 days free trial to Access Article
two level pipelined Systolic Arrays for matrix vector multiplication

Journal of Systems Architecture, 1998

Co-Authors: Ivan Milentijevic, Emina I. Milovanovic, Igor Z. Milovanovic, Milorad Tosic, Mile K. Stojcev

Abstract:

Abstract Novel two-level pipelined linear Systolic Arrays for matrix vector multiplication are proposed. The number of processing elements in the proposed Arrays s reduced to half of the number of processing elements in the existing Arrays. An area-time (AT) criteria is used to compare the proposed Arrays with the fastest existing one.

15 days free trial to Access Article
The Design of Optimal Planar Systolic Arrays for Matrix Multiplication

Computers & Mathematics with Applications, 1997

Co-Authors: Ivan Milentijevic, Emina I. Milovanovic, Igor Z. Milovanovic, Mile K. Stojcev

Abstract:

Abstract The objective of this paper is to provide a systematic methodology for the design of space-time optimal pure planar Systolic Arrays for matrix multiplication. The procedure is based on data dependence approach. By the described procedure, we obtain ten different Systolic Arrays denoted as S 1 to S 10 classified into three classes according to interconnection patterns between the processing elements. Common properties of all Systolic array designs are: each Systolic array consists of n 2 processing elements, near-neighbour communications, and active execution time of 3 n − 2 time units. Compared to designs found in the literature, our procedure always leads to Systolic Arrays with optimal number of processing elements. The improvement in space domain is not achieved at the cost of execution time or PEs complexity. We present mathematically rigorous procedure which gives the exact ordering of input matrix elements at the beginning of the computation. Examples illustrating the methodology are shown.

15 days free trial to Access Article

M. P. Bekakos - One of the best experts on this subject based on the ideXlab platform.

Orthogonal fault-tolerant Systolic Arrays for matrix multiplication

Microelectronics Reliability, 2011

Co-Authors: Igor Z. Milovanovic, Emina I. Milovanovic, Mile K. Stojcev, M. P. Bekakos

Abstract:

Abstract A systematic approach for designing one class of fault-tolerant Systolic Arrays (FTSAs) with orthogonal interconnects and unidirectional data flow, Orthogonal Unidirectional Systolic Array (OUSA), for multiplication of rectangular matrices is presented in this paper. The method employs space-time redundancy to achieve fault-tolerance. By conducting proposed systematic design procedure, four different Systolic Arrays of OUSA type are obtained. All the Arrays can tolerate single transient errors and majority of multiple errors with high probability. In order to provide high bandwidth in data access, a special hardware called address generator unit, was designed. Hardware complexity and performance gains achieved at higher (system, algorithm and architecture) design levels were analyzed. The obtained results show that with n2 + 2n processing elements the total execution time of the fault-tolerant algorithm is 6n + 3 time units, the hardware overhead due to involving fault-tolerance is in the range from 6.25% down to 0.8%, while time overhead is 50%. In addition, by involving hardware implemented address generation unit we reduce the total execution time of the algorithm almost five times, compared to software address calculations.

15 days free trial to Access Article
Hexagonal Systolic Arrays for matrix multiplication

2001

Co-Authors: M. P. Bekakos, Emina I. Milovanovic, I. Ž. Milovanović, T. I. Tokić, Mile K. Stojcev

Abstract:

We consider the problem of matrix multiplication on hexagonal Systolic Arrays (SA). We begin with the description of the procedure for Systolic array designing which is based on data dependency and space-time mapping of the nested loop algorithms. Then we introduce some performance measures which are used throughout the chapter for comparison of various SAs. We proceed with modification of the standard design procedure which enables synthesis of Systolic Arrays with the optimal number of processing elements (PE) for a given problem size and minimal execution time for a given number of PEs. Then we analyse and compare different hexagonal Arrays. Further, we show how execution time of matrix multiplication algorithm can be reduced if the number of PEs is increased with respect to the optimal one. Finally, we address the problem of fault-tolerant matrix multiplication on hexagonal Arrays.

15 days free trial to Access Article
VHDL Code Automatic Generator for Systolic Arrays

2006 2nd International Conference on Information & Communication Technologies, 1

Co-Authors: I.n. Tselepis, M. P. Bekakos

Abstract:

Systolic Arrays speed up scientific computations with inherent parallelization, by exploiting massive data pipeline parallelism. In addition, they include short and problem-size independent signal paths, predictable performance, scalability, and simple design and test. In this paper, a server-based software tool for the automatic generation of VHDL code describing Systolic Arrays topologies is presented. Input parameters of the tool are several essential factors for the architectural description of Systolic Arrays (SA), like the interconnection topology of the Systolic array, i.e., linear, mesh or hex-connected, the size of the Systolic array, i.e., the number of the processing elements (PE) in each dimension, the function of the PE, i.e., the relation between the output and the input ports of every PE and finally the bitlength of PE ports, i.e., the data word size of every port.

15 days free trial to Access Article

Igor Z. Milovanovic - One of the best experts on this subject based on the ideXlab platform.

design of linear Systolic Arrays for matrix multiplication

Advances in Electrical and Computer Engineering, 2014

Co-Authors: Emina I. Milovanovic, Igor Z. Milovanovic, Mile K. Stojcev, Tatjana R Nikolic

Abstract:

This paper presents architecture for matrix multiplication optimized to be integrated as an accelerator unit to a host computer. Two linear Systolic Arrays with unidirectional data flo ...

15 days free trial to Access Article
Orthogonal fault-tolerant Systolic Arrays for matrix multiplication

Microelectronics Reliability, 2011

Co-Authors: Igor Z. Milovanovic, Emina I. Milovanovic, Mile K. Stojcev, M. P. Bekakos

Abstract:

Abstract A systematic approach for designing one class of fault-tolerant Systolic Arrays (FTSAs) with orthogonal interconnects and unidirectional data flow, Orthogonal Unidirectional Systolic Array (OUSA), for multiplication of rectangular matrices is presented in this paper. The method employs space-time redundancy to achieve fault-tolerance. By conducting proposed systematic design procedure, four different Systolic Arrays of OUSA type are obtained. All the Arrays can tolerate single transient errors and majority of multiple errors with high probability. In order to provide high bandwidth in data access, a special hardware called address generator unit, was designed. Hardware complexity and performance gains achieved at higher (system, algorithm and architecture) design levels were analyzed. The obtained results show that with n2 + 2n processing elements the total execution time of the fault-tolerant algorithm is 6n + 3 time units, the hardware overhead due to involving fault-tolerance is in the range from 6.25% down to 0.8%, while time overhead is 50%. In addition, by involving hardware implemented address generation unit we reduce the total execution time of the algorithm almost five times, compared to software address calculations.

15 days free trial to Access Article
two level pipelined Systolic Arrays for matrix vector multiplication

Journal of Systems Architecture, 1998

Co-Authors: Ivan Milentijevic, Emina I. Milovanovic, Igor Z. Milovanovic, Milorad Tosic, Mile K. Stojcev

Abstract:

Abstract Novel two-level pipelined linear Systolic Arrays for matrix vector multiplication are proposed. The number of processing elements in the proposed Arrays s reduced to half of the number of processing elements in the existing Arrays. An area-time (AT) criteria is used to compare the proposed Arrays with the fastest existing one.

15 days free trial to Access Article
The Design of Optimal Planar Systolic Arrays for Matrix Multiplication

Computers & Mathematics with Applications, 1997

Co-Authors: Ivan Milentijevic, Emina I. Milovanovic, Igor Z. Milovanovic, Mile K. Stojcev

Abstract:

Abstract The objective of this paper is to provide a systematic methodology for the design of space-time optimal pure planar Systolic Arrays for matrix multiplication. The procedure is based on data dependence approach. By the described procedure, we obtain ten different Systolic Arrays denoted as S 1 to S 10 classified into three classes according to interconnection patterns between the processing elements. Common properties of all Systolic array designs are: each Systolic array consists of n 2 processing elements, near-neighbour communications, and active execution time of 3 n − 2 time units. Compared to designs found in the literature, our procedure always leads to Systolic Arrays with optimal number of processing elements. The improvement in space domain is not achieved at the cost of execution time or PEs complexity. We present mathematically rigorous procedure which gives the exact ordering of input matrix elements at the beginning of the computation. Examples illustrating the methodology are shown.

15 days free trial to Access Article
Matrix multiplication on non-planar Systolic Arrays

4th International Conference on Telecommunications in Modern Satellite Cable and Broadcasting Services. TELSIKS'99 (Cat. No.99EX365), 1

Co-Authors: T.i. Tokic, Emina I. Milovanovic, Igor Z. Milovanovic, N.m. Novakovic, M.k. Slojcev

Abstract:

A modification of the standard design procedure for mapping nested loop algorithms into Systolic Arrays is described in this article. This modification enables the authors to obtain non-planar Systolic Arrays for matrix multiplication with an optimal number of processing elements for a given problem size. The modification is based on composition of two linear mappings.

15 days free trial to Access Article

Raehong Park - One of the best experts on this subject based on the ideXlab platform.

unified Systolic Arrays for computation of the dct dst dht

IEEE Transactions on Circuits and Systems for Video Technology, 1997

Co-Authors: Sung Bum Pan, Raehong Park

Abstract:

We propose unified Systolic Arrays for computation of the one-dimensional (1-D) and two-dimensional (2-D) discrete cosine transform/discrete sine transform/discrete Hartley transform (DCT/DST/DHT). By decomposing the transforms into even- and odd-numbered frequency samples, the proposed architecture computes the 1-D DCT/DST/DHT. Compared to the conventional methods, the proposed Systolic Arrays exhibit advantages in terms of the number of PE's and latency. We generalize the proposed structure for computation of the 2-D DCT/DST/DHT. The unified Systolic Arrays can be employed for computation of the inverse DCT/DST/DHT (IDCT/IDST/IDHT).

15 days free trial to Access Article
vlsi architectures for block matching algorithms using Systolic Arrays

IEEE Transactions on Circuits and Systems for Video Technology, 1996

Co-Authors: Sung Bum Pan, Seung Soo Chae, Raehong Park

Abstract:

We investigate hardware implementation of block matching algorithms (BMAs) for motion estimation of moving sequences. Using Systolic Arrays, we propose VLSI architectures for the two-stage BMA and full search (FS) BMA. The two-stage BMA using integral projections reduces greatly the computational complexity with its performance comparable to that of the FS BMA. The proposed hardware architectures for the two-stage BMA and FS BMA are faster than the conventional hardware architectures with lower hardware complexity. Also, the proposed architecture of the first stage of the two-stage BMA is modeled in VHDL and simulated. Simulation results show the functional validity of the proposed architecture.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Songchun Zhu - One of the best experts on this subject based on the ideXlab platform.

sparse winograd convolutional neural networks on small scale Systolic Arrays

sparse winograd convolutional neural networks on small scale Systolic Arrays

Mile K. Stojcev - One of the best experts on this subject based on the ideXlab platform.

design of linear Systolic Arrays for matrix multiplication

Orthogonal fault-tolerant Systolic Arrays for matrix multiplication

Hexagonal Systolic Arrays for matrix multiplication

two level pipelined Systolic Arrays for matrix vector multiplication

The Design of Optimal Planar Systolic Arrays for Matrix Multiplication

M. P. Bekakos - One of the best experts on this subject based on the ideXlab platform.

Orthogonal fault-tolerant Systolic Arrays for matrix multiplication

Hexagonal Systolic Arrays for matrix multiplication

VHDL Code Automatic Generator for Systolic Arrays

Igor Z. Milovanovic - One of the best experts on this subject based on the ideXlab platform.

design of linear Systolic Arrays for matrix multiplication

Orthogonal fault-tolerant Systolic Arrays for matrix multiplication

two level pipelined Systolic Arrays for matrix vector multiplication

The Design of Optimal Planar Systolic Arrays for Matrix Multiplication

Matrix multiplication on non-planar Systolic Arrays

Raehong Park - One of the best experts on this subject based on the ideXlab platform.

unified Systolic Arrays for computation of the dct dst dht

vlsi architectures for block matching algorithms using Systolic Arrays

Systolic Arrays

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Related terms

Songchun Zhu - One of the best experts on this subject based on the ideXlab platform.

Mile K. Stojcev - One of the best experts on this subject based on the ideXlab platform.

M. P. Bekakos - One of the best experts on this subject based on the ideXlab platform.

Igor Z. Milovanovic - One of the best experts on this subject based on the ideXlab platform.

Raehong Park - One of the best experts on this subject based on the ideXlab platform.

Related terms