The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform
Jon Stockwood - One of the best experts on this subject based on the ideXlab platform.
-
hardware software co design of embedded reconfigurable architectures
Design Automation Conference, 2000Co-Authors: Timothy J Callahan, Ervan Darnell, Randolph Harr, Uday Kurkure, Jon StockwoodAbstract:In this paper we describe a new hardware/software partitioning approach for embedded reconfigurable architectures consisting of a general-Purpose Processor (CPU), a dynamically reconfigurable datapath (e.g. an FPGA), and a memory hierarchy. We have developed a framework called Nimble that automatically compiles system-level applications specified in C to executables on the target platform. A key component of this framework is a hardware/software partitioning algorithm that performs fine-grained partitioning (at loop and basic-block levels) of an application to execute on the combined CPU and datapath. The partitioning algorithm optimizes the global application execution time, including the software and hardware execution times, communication time and datapath reconfiguration time. Experimental results on real applications show that our algorithm is effective in rapidly finding close to optimal solutions.
-
hardware software co design of embedded reconfigurable architectures
Design Automation Conference, 2000Co-Authors: Yanbing Li, Timothy J Callahan, Ervan Darnell, Randolph Harr, Uday Kurkure, Jon StockwoodAbstract:In this paper we describe a new hardware/software partitioning approach for embedded reconfigurable architectures consisting of a general-Purpose Processor (CPU), a dynamically reconfigurable datapath (e.g. an FPGA), and a memory hierarchy. We have developed a framework called Nimble that automatically compiles system-level applications specified in C to executables on the target platform. A key component of this framework is a hardware/software partitioning algorithm that performs fine-grained partitioning (at loop and basic-block levels) of an application to execute on the combined CPU and datapath. The partitioning algorithm optimizes the global application execution time, including the software and hardware execution times, communication time and datapath reconfiguration time. Experimental results on real applications show that our algorithm is effective in rapidly finding close to optimal solutions.
P A Ivey - One of the best experts on this subject based on the ideXlab platform.
-
an array Processor for general Purpose digital image compression
IEEE Journal of Solid-state Circuits, 1995Co-Authors: R B Yates, Neil A Thacker, S J Evans, S N Walker, P A IveyAbstract:A new VLSI Processor (DIP chip) for image compression is presented which combines principles of multipipeline and array processing. The device is not specific to any one image compression algorithm and can be regarded as a general Purpose Processor. The chip has been implemented using a CMOS 1.0-/spl mu/m process on a 14.4/spl times/13.5-mm/sup 2/ die. An internal clock frequency of 40 MHz results in 1.2/spl times/10/sup 9/ operations/s on 8-bit data. Solutions to problems associated with the large bandwidth required, for both image data and instruction streams, is the main aim of the paper. The necessary problem of increasing the array clock frequency relative to the input/output clock frequency without the need for a large on-chip instruction cache or fast external clock speeds is also addressed. >
-
an array Processor for general Purpose digital image compression
Custom Integrated Circuits Conference, 1995Co-Authors: R B Yates, Neil A Thacker, S J Evans, S N Walker, P A IveyAbstract:―A new VLSI Processor (DIP chip) for image compression is presented which combines principles of multipipeline and array processing. The device is not specific to any one image compression algorithm and can be regarded as a general Purpose Processor. The chip has been implemented using a CMOS 1.0-μm process on a 14.4 x 13.5-mm 2 die. An internal clock frequency of 40 MHz results in 1.2 x 10 9 operations/s on 8-bit data. Solutions to problems associated with the large bandwidth required, for both image data and instruction streams, is the main aim of the paper. The necessary problem of increasing the array clock frequency relative to the input/output clock frequency without the need for a large on-chip instruction cache or fast external clock speeds is also addressed.
R B Yates - One of the best experts on this subject based on the ideXlab platform.
-
an array Processor for general Purpose digital image compression
IEEE Journal of Solid-state Circuits, 1995Co-Authors: R B Yates, Neil A Thacker, S J Evans, S N Walker, P A IveyAbstract:A new VLSI Processor (DIP chip) for image compression is presented which combines principles of multipipeline and array processing. The device is not specific to any one image compression algorithm and can be regarded as a general Purpose Processor. The chip has been implemented using a CMOS 1.0-/spl mu/m process on a 14.4/spl times/13.5-mm/sup 2/ die. An internal clock frequency of 40 MHz results in 1.2/spl times/10/sup 9/ operations/s on 8-bit data. Solutions to problems associated with the large bandwidth required, for both image data and instruction streams, is the main aim of the paper. The necessary problem of increasing the array clock frequency relative to the input/output clock frequency without the need for a large on-chip instruction cache or fast external clock speeds is also addressed. >
-
an array Processor for general Purpose digital image compression
Custom Integrated Circuits Conference, 1995Co-Authors: R B Yates, Neil A Thacker, S J Evans, S N Walker, P A IveyAbstract:―A new VLSI Processor (DIP chip) for image compression is presented which combines principles of multipipeline and array processing. The device is not specific to any one image compression algorithm and can be regarded as a general Purpose Processor. The chip has been implemented using a CMOS 1.0-μm process on a 14.4 x 13.5-mm 2 die. An internal clock frequency of 40 MHz results in 1.2 x 10 9 operations/s on 8-bit data. Solutions to problems associated with the large bandwidth required, for both image data and instruction streams, is the main aim of the paper. The necessary problem of increasing the array clock frequency relative to the input/output clock frequency without the need for a large on-chip instruction cache or fast external clock speeds is also addressed.
Piotr Dudek - One of the best experts on this subject based on the ideXlab platform.
-
asynchronous cellular logic network as a co Processor for a general Purpose massively parallel array
International Journal of Circuit Theory and Applications, 2011Co-Authors: Alexey Lopich, Piotr DudekAbstract:This paper demonstrates an implementation of an asynchronous cellular Processor array that facilitates binary trigger-wave propagations, extensively used in various image-processing algorithms. The circuit operates in a continuous-time mode, achieving high operational performance and low-power consumption. An integrated circuit with proof-of-concept array of 24×60 cells has been fabricated in a 0.35µm three-metal CMOS process and tested. Occupying only 16×8µm2 the binary wave-propagation cell is designed to be used as a co-Processor in general-Purpose Processor-per-pixel arrays intended for focal-plane image processing. The results of global operations such as object reconstruction and hole filling are presented. Copyright © 2010 John Wiley & Sons, Ltd.
-
implementation of an asynchronous cellular logic network as a co Processor for a general Purpose massively parallel array
European Conference on Circuit Theory and Design, 2007Co-Authors: Alexey Lopich, Piotr DudekAbstract:In this paper we present an implementation of an asynchronous cellular Processor array that facilitates binary trigger-wave propagations, extensively used in various image processing algorithms. The circuit operates in a continuous-time mode, achieving high operational performance and low power consumption. A 24 times 60 proof-of-concept array integrated circuit has been fabricated in a 0.35 mum 3-metal CMOS process and tested. Occupying only 16 times 8 mum2 the binary wave-propagation cell is used as a coProcessor in a general-Purpose Processor-per-pixel array that is designed for focal-plane image processing. The results of global operations such as object reconstruction and hole filling are presented.
-
a general Purpose Processor per pixel analog simd vision chip
IEEE Transactions on Circuits and Systems, 2005Co-Authors: Piotr Dudek, P J HicksAbstract:A smart-sensor VLSI circuit suitable for focal-plane low-level image processing applications is presented. The architecture of the device is based on a fine-grain software-programmable SIMD Processor array. Processing elements, integrated within each pixel of the imager, are implemented utilising a switched-current analog microProcessor concept. This allows the achievement of real-time image processing speeds with high efficiency in terms of silicon area and power dissipation. The prototype 21 /spl times/ 21 vision chip is fabricated in a 0.6 /spl mu/m CMOS technology and achieves a cell size of 98.6 /spl mu/m /spl times/ 98.6 /spl mu/m. It executes over 1.1 giga instructions per second (GIPS) while dissipating under 40 mW of power. The architecture, circuit design and experimental results are presented in this paper.
Timothy J Callahan - One of the best experts on this subject based on the ideXlab platform.
-
hardware software co design of embedded reconfigurable architectures
Design Automation Conference, 2000Co-Authors: Timothy J Callahan, Ervan Darnell, Randolph Harr, Uday Kurkure, Jon StockwoodAbstract:In this paper we describe a new hardware/software partitioning approach for embedded reconfigurable architectures consisting of a general-Purpose Processor (CPU), a dynamically reconfigurable datapath (e.g. an FPGA), and a memory hierarchy. We have developed a framework called Nimble that automatically compiles system-level applications specified in C to executables on the target platform. A key component of this framework is a hardware/software partitioning algorithm that performs fine-grained partitioning (at loop and basic-block levels) of an application to execute on the combined CPU and datapath. The partitioning algorithm optimizes the global application execution time, including the software and hardware execution times, communication time and datapath reconfiguration time. Experimental results on real applications show that our algorithm is effective in rapidly finding close to optimal solutions.
-
hardware software co design of embedded reconfigurable architectures
Design Automation Conference, 2000Co-Authors: Yanbing Li, Timothy J Callahan, Ervan Darnell, Randolph Harr, Uday Kurkure, Jon StockwoodAbstract:In this paper we describe a new hardware/software partitioning approach for embedded reconfigurable architectures consisting of a general-Purpose Processor (CPU), a dynamically reconfigurable datapath (e.g. an FPGA), and a memory hierarchy. We have developed a framework called Nimble that automatically compiles system-level applications specified in C to executables on the target platform. A key component of this framework is a hardware/software partitioning algorithm that performs fine-grained partitioning (at loop and basic-block levels) of an application to execute on the combined CPU and datapath. The partitioning algorithm optimizes the global application execution time, including the software and hardware execution times, communication time and datapath reconfiguration time. Experimental results on real applications show that our algorithm is effective in rapidly finding close to optimal solutions.