Parallel Vector

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 49278 Experts worldwide ranked by ideXlab platform

Edward Rutledge - One of the best experts on this subject based on the ideXlab platform.

  • Parallel vsipl an open standard software library for high performance Parallel signal processing
    Proceedings of the IEEE, 2005
    Co-Authors: J M Lebak, Jeremy Kepner, Henry Hoffmann, Edward Rutledge
    Abstract:

    Real-time signal processing consumes the majority of the world's computing power. Increasingly, programmable Parallel processors are used to address a wide variety of signal processing applications (e.g., scientific, video, wireless, medical, communication, encoding, radar, sonar, and imaging). In programmable systems, the major challenge is no longer hardware but software. Specifically, the key technical hurdle lies in allowing the user to write programs at high level, while still achieving performance and preserving the portability of the code across Parallel computing hardware platforms. The Parallel Vector, Signal, and Image Processing Library (Parallel VSIPL++) addresses this hurdle by providing high-level C++ array constructs, a simple mechanism for mapping data and functions onto Parallel hardware, and a community-defined portable interface. This paper presents an overview of the Parallel VSIPL++ standard as well as a deeper description of the technical foundations and expected performance of the library. Parallel VSIPL++ supports adaptive optimization at many levels. The C++ arrays are designed to support automatic hardware specialization by the compiler. The computation objects (e.g., fast Fourier transforms) are built with explicit setup and run stages to allow for runtime optimization. Parallel arrays and functions in Parallel VSIPL++ also support explicit setup and run stages, which are used to accelerate communication operations. The Parallel mapping mechanism provides an external interface that allows optimal mappings to be generated offline and read into the system at runtime. Finally, the standard has been developed in collaboration with high performance embedded computing vendors and is compatible with their proprietary approaches to achieving performance.

  • an open standard software library for high performance Parallel signal processing the Parallel vsipl library
    2004
    Co-Authors: J M Lebak, Jeremy Kepner, Henry Hoffmann, Edward Rutledge
    Abstract:

    Real-time signal processing consumes the majority of the world’s computing power. Increasingly, programmable Parallel processors are used to address a wide variety of signal processing applications (e.g. scientific, video, wireless, medical, communication, encoding, radar, sonar and imaging). In programmable systems the major challenge is no longer hardware but software. Specifically, the key technical hurdle lies in allowing the user to write programs at high level, while still achieving performance and preserving the portability of the code across Parallel computing hardware platforms. The Parallel Vector, Signal, and Image Processing Library (Parallel VSIPL++) addresses this hurdle by providing high level C++ array constructs, a simple mechanism for mapping data and functions onto Parallel hardware, and a community-defined portable interface. This paper presents an overview of the Parallel VSIPL++ standard as well as a deeper description of the technical foundations and expected performance of the library. Parallel VSIPL++ supports adaptive optimization at many levels. The C++ arrays are designed to support automatic hardware specialization by the compiler. The computation objects (e.g. Fast Fourier Transforms) are built with explicit setup and run stages to allow for run-time optimization. Parallel arrays and functions in VSIPL++ support these same features, which are used to accelerate communication operations. The Parallel mapping mechanism provides an external interface that allows optimal mappings to be generated off-line and read into the system at run-time. Finally, the standard has been developed in collaboration with high performance embedded computing vendors and is compatible with their proprietary approaches to achieving performance. Copyright 2004 MIT Lincoln Laboratory. This work is sponsored by the High Performance Computing Modernization Office under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government. April 15, 2004 DRAFT

  • pvl an object oriented software library for Parallel signal processing
    Foundations of Computer Science, 2001
    Co-Authors: Edward Rutledge, Jeremy Kepner
    Abstract:

    Real-time signal processing consumes the majority of the world’s computing power. Increasingly, programmable Parallel microprocessors are used to address a wide variety of signal processing applications (e.g. scientific, video, wireless, medical, communication, encoding, radar, sonar and imaging). In programmable systems the major challenge is no longer hardware but software. Specifically, the key technical hurdle lies in mapping (i.e., placement and routing) of an algorithm onto a Parallel computer in a general manner that preserves software portability. We have developed the Parallel Vector Library (PVL) to allow signal processing algorithms to be written using high level Matlab like constructs that are independent of the underlying Parallel mapping. Programs written using PVL can be ported to a wide range of Parallel computers without sacrificing performance. Furthemore, the mapping concepts in PVL provide the infrastructure for enabling new capabilities such as fault tolerance, dynamic scheduling and self-optimization. This presentation discusses PVL with particular emphasis on quantitative comparisons with standard Parallel signal programming practices.

Mingoo Seok - One of the best experts on this subject based on the ideXlab platform.

  • c3sram an in memory computing sram macro based on robust capacitive coupling computing mechanism
    IEEE Journal of Solid-state Circuits, 2020
    Co-Authors: Zhewei Jiang, Shihui Yin, Jaesun Seo, Mingoo Seok
    Abstract:

    This article presents C3SRAM, an in-memory-computing SRAM macro. The macro is an SRAM module with the circuits embedded in bitcells and peripherals to perform hardware acceleration for neural networks with binarized weights and activations. The macro utilizes analog-mixed-signal (AMS) capacitive-coupling computing to evaluate the main computations of binary neural networks, binary-multiply-and-accumulate operations. Without the need to access the stored weights by individual row, the macro asserts all its rows simultaneously and forms an analog voltage at the read bitline node through capacitive voltage division. With one analog-to-digital converter (ADC) per column, the macro realizes fully Parallel Vector–matrix multiplication in a single cycle. The network type that the macro supports and the computing mechanism it utilizes are determined by the robustness and error tolerance necessary in AMS computing. The C3SRAM macro is prototyped in a 65-nm CMOS. It demonstrates an energy efficiency of 672 TOPS/W and a speed of 1638 GOPS (20.2 TOPS/mm2), achieving 3975 $\times $ better energy–delay product than the conventional digital baseline performing the same operation. The macro achieves 98.3% accuracy for MNIST and 85.5% for CIFAR-10, which is among the best in-memory computing works in terms of energy efficiency and inference accuracy tradeoff.

  • c3sram in memory computing sram macro based on capacitive coupling computing
    European Solid-State Circuits Conference, 2019
    Co-Authors: Zhewei Jiang, Shihui Yin, Jaesun Seo, Mingoo Seok
    Abstract:

    This letter presents C3SRAM, an in-memory-computing SRAM macro, which utilizes analog-mixed-signal capacitive-coupling computing to perform XNOR-and-accumulate operations for binary deep neural networks. The $256\times 64$ C3SRAM macro asserts all 256 rows simultaneously and equips one ADC per column, realizing fully Parallel Vector-matrix multiplication in one cycle. C3SRAM demonstrates 672 TOPS/W and 1638 GOPS, and achieves 98.3% accuracy for MNIST and 85.5% for CIFAR-10 dataset. It achieves $3975\times $ smaller energy-delay product than conventional digital processors.

Jeremy Kepner - One of the best experts on this subject based on the ideXlab platform.

  • Parallel vsipl an open standard software library for high performance Parallel signal processing
    Proceedings of the IEEE, 2005
    Co-Authors: J M Lebak, Jeremy Kepner, Henry Hoffmann, Edward Rutledge
    Abstract:

    Real-time signal processing consumes the majority of the world's computing power. Increasingly, programmable Parallel processors are used to address a wide variety of signal processing applications (e.g., scientific, video, wireless, medical, communication, encoding, radar, sonar, and imaging). In programmable systems, the major challenge is no longer hardware but software. Specifically, the key technical hurdle lies in allowing the user to write programs at high level, while still achieving performance and preserving the portability of the code across Parallel computing hardware platforms. The Parallel Vector, Signal, and Image Processing Library (Parallel VSIPL++) addresses this hurdle by providing high-level C++ array constructs, a simple mechanism for mapping data and functions onto Parallel hardware, and a community-defined portable interface. This paper presents an overview of the Parallel VSIPL++ standard as well as a deeper description of the technical foundations and expected performance of the library. Parallel VSIPL++ supports adaptive optimization at many levels. The C++ arrays are designed to support automatic hardware specialization by the compiler. The computation objects (e.g., fast Fourier transforms) are built with explicit setup and run stages to allow for runtime optimization. Parallel arrays and functions in Parallel VSIPL++ also support explicit setup and run stages, which are used to accelerate communication operations. The Parallel mapping mechanism provides an external interface that allows optimal mappings to be generated offline and read into the system at runtime. Finally, the standard has been developed in collaboration with high performance embedded computing vendors and is compatible with their proprietary approaches to achieving performance.

  • an open standard software library for high performance Parallel signal processing the Parallel vsipl library
    2004
    Co-Authors: J M Lebak, Jeremy Kepner, Henry Hoffmann, Edward Rutledge
    Abstract:

    Real-time signal processing consumes the majority of the world’s computing power. Increasingly, programmable Parallel processors are used to address a wide variety of signal processing applications (e.g. scientific, video, wireless, medical, communication, encoding, radar, sonar and imaging). In programmable systems the major challenge is no longer hardware but software. Specifically, the key technical hurdle lies in allowing the user to write programs at high level, while still achieving performance and preserving the portability of the code across Parallel computing hardware platforms. The Parallel Vector, Signal, and Image Processing Library (Parallel VSIPL++) addresses this hurdle by providing high level C++ array constructs, a simple mechanism for mapping data and functions onto Parallel hardware, and a community-defined portable interface. This paper presents an overview of the Parallel VSIPL++ standard as well as a deeper description of the technical foundations and expected performance of the library. Parallel VSIPL++ supports adaptive optimization at many levels. The C++ arrays are designed to support automatic hardware specialization by the compiler. The computation objects (e.g. Fast Fourier Transforms) are built with explicit setup and run stages to allow for run-time optimization. Parallel arrays and functions in VSIPL++ support these same features, which are used to accelerate communication operations. The Parallel mapping mechanism provides an external interface that allows optimal mappings to be generated off-line and read into the system at run-time. Finally, the standard has been developed in collaboration with high performance embedded computing vendors and is compatible with their proprietary approaches to achieving performance. Copyright 2004 MIT Lincoln Laboratory. This work is sponsored by the High Performance Computing Modernization Office under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government. April 15, 2004 DRAFT

  • software technologies for high performance Parallel signal processing
    2003
    Co-Authors: Jeremy Kepner, J M Lebak
    Abstract:

    ■ Real-time signal processing consumes the majority of the world’s computing power. Increasingly, programmable Parallel processors are used to address a wide variety of signal processing applications (e.g., scientific, video, wireless, medical, communication, encoding, radar, sonar, and imaging). In programmable systems the major challenge is no longer the speed of the hardware but the complexity of optimized software. Specifically, the key technical hurdle lies in mapping an algorithm onto a Parallel computer in a general manner that preserves performance while providing software portability. We have developed the Parallel Vector Library (PVL) to allow signal processing algorithms to be written with high-level mathematical constructs that are independent of the underlying Parallel mapping. Programs written using PVL can be ported to a wide range of Parallel computers without sacrificing performance. Furthermore, the mapping concepts in PVL provide the infrastructure for enabling new capabilities such as fault tolerance and self-optimization. This article discusses PVL with a particular emphasis on quantitative comparisons with standard Parallel signal programming practices.

  • pvl an object oriented software library for Parallel signal processing
    Foundations of Computer Science, 2001
    Co-Authors: Edward Rutledge, Jeremy Kepner
    Abstract:

    Real-time signal processing consumes the majority of the world’s computing power. Increasingly, programmable Parallel microprocessors are used to address a wide variety of signal processing applications (e.g. scientific, video, wireless, medical, communication, encoding, radar, sonar and imaging). In programmable systems the major challenge is no longer hardware but software. Specifically, the key technical hurdle lies in mapping (i.e., placement and routing) of an algorithm onto a Parallel computer in a general manner that preserves software portability. We have developed the Parallel Vector Library (PVL) to allow signal processing algorithms to be written using high level Matlab like constructs that are independent of the underlying Parallel mapping. Programs written using PVL can be ported to a wide range of Parallel computers without sacrificing performance. Furthemore, the mapping concepts in PVL provide the infrastructure for enabling new capabilities such as fault tolerance, dynamic scheduling and self-optimization. This presentation discusses PVL with particular emphasis on quantitative comparisons with standard Parallel signal programming practices.

Kwang Y Lee - One of the best experts on this subject based on the ideXlab platform.

  • review multi objective based on Parallel Vector evaluated particle swarm optimization for optimal steady state performance of power systems
    Expert Systems With Applications, 2009
    Co-Authors: J G Vlachogiannis, Kwang Y Lee
    Abstract:

    In this paper the state-of-the-art extended particle swarm optimization (PSO) methods for solving multi-objective optimization problems are represented. We emphasize in those, the co-evolution technique of the Parallel Vector evaluated PSO (VEPSO), analysed and applied in a multi-objective problem of steady-state of power systems. Specifically, reactive power control is formulated as a multi-objective optimization problem and solved using the Parallel VEPSO algorithm. The results on the IEEE 30-bus test system are compared with those given by another multi-objective evolutionary technique demonstrating the advantage of Parallel VEPSO. The Parallel VEPSO is also tested on a larger power system this with 136 busses.

  • determining generator contributions to transmission system using Parallel Vector evaluated particle swarm optimization
    IEEE Transactions on Power Systems, 2005
    Co-Authors: J G Vlachogiannis, Kwang Y Lee
    Abstract:

    In this paper, the generator contributions to the transmission system are determined by an evolutionary computation technique. Evaluating the contributions of generators to the power flows in transmission lines is formulated as a multiobjective optimization problem and calculated using a Parallel Vector evaluated particle swarm optimization (VEPSO) algorithm. Specifically, the contributions are modeled by particles of swarms whose positions are optimally determined while satisfying all multiobjectives and other physical and operating constraints. The VEPSO method is Parallelized by distributing the swarms in a number of networked PCs. The proposed Parallel VEPSO algorithm accounts for nonlinear characteristics of the generators and transmission lines. The applicability of the proposed Parallel VEPSO algorithm in accessing the generator contributions is demonstrated and compared with analytical methods for four different systems: three-bus, six-bus, IEEE 30-bus, and 136-bus test systems. The experimental results show that the proposed Parallel VEPSO algorithm is capable of obtaining precise solutions compared to analytical methods while considering nonlinear characteristics of the systems.

Zhewei Jiang - One of the best experts on this subject based on the ideXlab platform.

  • c3sram an in memory computing sram macro based on robust capacitive coupling computing mechanism
    IEEE Journal of Solid-state Circuits, 2020
    Co-Authors: Zhewei Jiang, Shihui Yin, Jaesun Seo, Mingoo Seok
    Abstract:

    This article presents C3SRAM, an in-memory-computing SRAM macro. The macro is an SRAM module with the circuits embedded in bitcells and peripherals to perform hardware acceleration for neural networks with binarized weights and activations. The macro utilizes analog-mixed-signal (AMS) capacitive-coupling computing to evaluate the main computations of binary neural networks, binary-multiply-and-accumulate operations. Without the need to access the stored weights by individual row, the macro asserts all its rows simultaneously and forms an analog voltage at the read bitline node through capacitive voltage division. With one analog-to-digital converter (ADC) per column, the macro realizes fully Parallel Vector–matrix multiplication in a single cycle. The network type that the macro supports and the computing mechanism it utilizes are determined by the robustness and error tolerance necessary in AMS computing. The C3SRAM macro is prototyped in a 65-nm CMOS. It demonstrates an energy efficiency of 672 TOPS/W and a speed of 1638 GOPS (20.2 TOPS/mm2), achieving 3975 $\times $ better energy–delay product than the conventional digital baseline performing the same operation. The macro achieves 98.3% accuracy for MNIST and 85.5% for CIFAR-10, which is among the best in-memory computing works in terms of energy efficiency and inference accuracy tradeoff.

  • c3sram in memory computing sram macro based on capacitive coupling computing
    European Solid-State Circuits Conference, 2019
    Co-Authors: Zhewei Jiang, Shihui Yin, Jaesun Seo, Mingoo Seok
    Abstract:

    This letter presents C3SRAM, an in-memory-computing SRAM macro, which utilizes analog-mixed-signal capacitive-coupling computing to perform XNOR-and-accumulate operations for binary deep neural networks. The $256\times 64$ C3SRAM macro asserts all 256 rows simultaneously and equips one ADC per column, realizing fully Parallel Vector-matrix multiplication in one cycle. C3SRAM demonstrates 672 TOPS/W and 1638 GOPS, and achieves 98.3% accuracy for MNIST and 85.5% for CIFAR-10 dataset. It achieves $3975\times $ smaller energy-delay product than conventional digital processors.