Data-Processing Instruction

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 180807 Experts worldwide ranked by ideXlab platform

Qingdong Yao - One of the best experts on this subject based on the ideXlab platform.

  • Media digital signal processor core design for multimedia application
    Multimedia on Mobile Devices II, 2006
    Co-Authors: Peng Liu, Wei-guang Cai, Qingdong Yao
    Abstract:

    An embedded single media processor named MediaDSP3200 core fabricated in a six-layer metal 0.18um CMOS process which implemented the RISC Instruction set, DSP data processing Instruction set and single-Instruction-multiple-data (SIMD) multimedia-enhanced Instruction set is described. MediaDSP3200 fuses RISC architecture and DSP computation capability thoroughly, which achieves RISC fundamental, DSP extended and single Instruction multiple data (SIMD) Instruction set with various addressing modes in a unified pipeline stage architecture. These characteristics enhance system digital signal processing performance greatly. The test processor can achieve 32x32-bit multiply-accumulate (MAC) of 320 MOPS, with 16x16-bit MAC of 1280MOPS. The test processor dissipates 600mW at 1.8v, 320MHz. Also, the implementation was primarily standard cell logic design style. MediaDSP3200 targets diverse embedded application systems, which need both powerful processing/control capability and low-cost budget, e.g. set-top-boxes, video conferencing, DTV, etc. MediaDSP3200 Instruction set architecture, addressing mode, pipeline design, SIMD feature, split-ALU and MAC are described in this paper. Finally, the performance benchmark based on H.264 and MPEG decoder algorithm are given in this paper.

Peng Liu - One of the best experts on this subject based on the ideXlab platform.

  • Media digital signal processor core design for multimedia application
    Multimedia on Mobile Devices II, 2006
    Co-Authors: Peng Liu, Wei-guang Cai, Qingdong Yao
    Abstract:

    An embedded single media processor named MediaDSP3200 core fabricated in a six-layer metal 0.18um CMOS process which implemented the RISC Instruction set, DSP data processing Instruction set and single-Instruction-multiple-data (SIMD) multimedia-enhanced Instruction set is described. MediaDSP3200 fuses RISC architecture and DSP computation capability thoroughly, which achieves RISC fundamental, DSP extended and single Instruction multiple data (SIMD) Instruction set with various addressing modes in a unified pipeline stage architecture. These characteristics enhance system digital signal processing performance greatly. The test processor can achieve 32x32-bit multiply-accumulate (MAC) of 320 MOPS, with 16x16-bit MAC of 1280MOPS. The test processor dissipates 600mW at 1.8v, 320MHz. Also, the implementation was primarily standard cell logic design style. MediaDSP3200 targets diverse embedded application systems, which need both powerful processing/control capability and low-cost budget, e.g. set-top-boxes, video conferencing, DTV, etc. MediaDSP3200 Instruction set architecture, addressing mode, pipeline design, SIMD feature, split-ALU and MAC are described in this paper. Finally, the performance benchmark based on H.264 and MPEG decoder algorithm are given in this paper.

Wei-guang Cai - One of the best experts on this subject based on the ideXlab platform.

  • Media digital signal processor core design for multimedia application
    Multimedia on Mobile Devices II, 2006
    Co-Authors: Peng Liu, Wei-guang Cai, Qingdong Yao
    Abstract:

    An embedded single media processor named MediaDSP3200 core fabricated in a six-layer metal 0.18um CMOS process which implemented the RISC Instruction set, DSP data processing Instruction set and single-Instruction-multiple-data (SIMD) multimedia-enhanced Instruction set is described. MediaDSP3200 fuses RISC architecture and DSP computation capability thoroughly, which achieves RISC fundamental, DSP extended and single Instruction multiple data (SIMD) Instruction set with various addressing modes in a unified pipeline stage architecture. These characteristics enhance system digital signal processing performance greatly. The test processor can achieve 32x32-bit multiply-accumulate (MAC) of 320 MOPS, with 16x16-bit MAC of 1280MOPS. The test processor dissipates 600mW at 1.8v, 320MHz. Also, the implementation was primarily standard cell logic design style. MediaDSP3200 targets diverse embedded application systems, which need both powerful processing/control capability and low-cost budget, e.g. set-top-boxes, video conferencing, DTV, etc. MediaDSP3200 Instruction set architecture, addressing mode, pipeline design, SIMD feature, split-ALU and MAC are described in this paper. Finally, the performance benchmark based on H.264 and MPEG decoder algorithm are given in this paper.

Gurindar S. Sohi - One of the best experts on this subject based on the ideXlab platform.

  • Design and evaluation of a multiscalar processor
    1998
    Co-Authors: Scott E. Breach, Gurindar S. Sohi
    Abstract:

    As the demand for processing power continues to escalate, future processor designs intended to meet this demand for performance must do so within the constraints of future implementation technology and the limits of practicable implementation costs. This thesis investigates a new type of processor based on the novel multiscalar paradigm. A multiscalar processor uses a “divide and conquer” strategy as a means to overcome the engineering challenges that face existing types of processors with respect to achieving high performance via improvements in Instruction-level parallelism and clock speed. This thesis focuses on the three most significant aspects of a multiscalar processor: Instruction and data processing, Instruction supply, and data supply. Detailed design descriptions and experimental evaluations are provided, identifying the impact of each aspect in terms of its individual performance as well as its contribution to overall performance. In addition, a comparison of realistic multiscalar and idealistic superscalar designs is provided to ascertain how this alternative approach performs relative to a well-known conventional approach. The key components that dictate the characteristics of a multiscalar processor—processing units for Instruction and data processing, hierarchical prediction and Instruction memory for Instruction supply, register file and data memory for data supply—are discussed in terms of the basic issues involved in their design. Moreover, the challenges/concerns for alternative designs are presented to focus on promising candidates for study. Each candidate is specified in terms of its overall structure and is evaluated under a range of design parameters to characterize its behavior and potential bottlenecks. The performance comparison measures the speedup, relative to a baseline 1-wide out-of-order issue processor, of realistic multiscalar processors and idealistic superscalar processors. Given the microarchitecture and compiler capabilities assumed, this study indicates that even without an advantage in clock speed multiscalar processors can outperform superscalar processors, over a large range of configurations for the SPEC CFP95 programs, but over only a small range for the SPEC CINT95 programs. However, a key limitation of this work is that it is unable to factor in the clock speed difference between multiscalar and superscalar processors expected in actual implementations.

Scott E. Breach - One of the best experts on this subject based on the ideXlab platform.

  • Design and evaluation of a multiscalar processor
    1998
    Co-Authors: Scott E. Breach, Gurindar S. Sohi
    Abstract:

    As the demand for processing power continues to escalate, future processor designs intended to meet this demand for performance must do so within the constraints of future implementation technology and the limits of practicable implementation costs. This thesis investigates a new type of processor based on the novel multiscalar paradigm. A multiscalar processor uses a “divide and conquer” strategy as a means to overcome the engineering challenges that face existing types of processors with respect to achieving high performance via improvements in Instruction-level parallelism and clock speed. This thesis focuses on the three most significant aspects of a multiscalar processor: Instruction and data processing, Instruction supply, and data supply. Detailed design descriptions and experimental evaluations are provided, identifying the impact of each aspect in terms of its individual performance as well as its contribution to overall performance. In addition, a comparison of realistic multiscalar and idealistic superscalar designs is provided to ascertain how this alternative approach performs relative to a well-known conventional approach. The key components that dictate the characteristics of a multiscalar processor—processing units for Instruction and data processing, hierarchical prediction and Instruction memory for Instruction supply, register file and data memory for data supply—are discussed in terms of the basic issues involved in their design. Moreover, the challenges/concerns for alternative designs are presented to focus on promising candidates for study. Each candidate is specified in terms of its overall structure and is evaluated under a range of design parameters to characterize its behavior and potential bottlenecks. The performance comparison measures the speedup, relative to a baseline 1-wide out-of-order issue processor, of realistic multiscalar processors and idealistic superscalar processors. Given the microarchitecture and compiler capabilities assumed, this study indicates that even without an advantage in clock speed multiscalar processors can outperform superscalar processors, over a large range of configurations for the SPEC CFP95 programs, but over only a small range for the SPEC CINT95 programs. However, a key limitation of this work is that it is unable to factor in the clock speed difference between multiscalar and superscalar processors expected in actual implementations.