Processor Core

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 33972 Experts worldwide ranked by ideXlab platform

Tatsuo Ohtsuki - One of the best experts on this subject based on the ideXlab platform.

  • ASP-DAC - An interface-circuit synthesis method with configurable Processor Core in IP-based SoC designs
    Proceedings of the 2006 conference on Asia South Pacific design automation - ASP-DAC '06, 2006
    Co-Authors: Shunitsu Kohara, Yuichiro Miyaoka, Masao Yanagisawa, Nozomu Togawa, Naoki Tomono, Jumpei Uchida, Tatsuo Ohtsuki
    Abstract:

    In SoC designs, efficient communication between the hardware IPs and the on-chip Processor becomes very important; however the interface is usually affected by the Processor Core specification. Thus in this paper, we focus on developing an efficient interface circuit architecture for the communications between the on-chip Processor and embedded hardware IP Cores, we also propose a method to synthesize it. Experimental results show that our method could obtain optimal interface circuits and works well through designing a MPEG-4 encode application.

  • Sub-operation Parallelism Optimization in SIMD Processor Core Synthesis
    IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, 2005
    Co-Authors: Hideki Kawazu, Yuichiro Miyaoka, Masao Yanagisawa, Nozomu Togawa, Jumpei Uchida, Tatsuo Ohtsuki
    Abstract:

    A b-bit SIMD functional unit has nk-bit sub-functional units in itself, where b = k × n. It can execute n-parallel k-bit operations. However, all the b-bit functional units in a Processor Core do not necessarily execute n-parallel operations. Depending on an application program, some of them just execute n/2-parallel operations or even n/4-parallel operations. This means that we can modify a b-bit SIMD functional unit so that it has n/2 k-bit sub-functional units or n/4 k-bit sub-functional units. The number of k-bit sub-functional units in a SIMD functional unit is called sub-operation parallelism. We incorporate a sub-operation parallelism optimization algorithm into SIMD functional unit optimization. Our proposed algorithm gradually reduces sub-operation parallelism of a SIMD functional unit while the timing constraint of execution time satisfied. Thereby, we can finally find a Processor Core with small area under the given timing constraint. We expect that we can obtain Processor Core configurations of smaller area in the same timing constraint rather than a conventional system. The promising experimental results are also shown.

  • ASP-DAC - A Processor Core synthesis system in IP-based SoC design
    Proceedings of the 2005 conference on Asia South Pacific design automation - ASP-DAC '05, 2005
    Co-Authors: Naoki Tomono, Yuichiro Miyaoka, Masao Yanagisawa, Nozomu Togawa, Shunitsu Kohara, Jumpei Uchida, Tatsuo Ohtsuki
    Abstract:

    This paper proposes a new design methodology for SoCs reusing hardware IPs. In our approach, after system-level HW/SW partitioning, we use IPs for hardware parts, but synthesize a new Processor Core instead of reusing a Processor Core IP. System performs efficient parallel execution of hardware and software by taking account of a response time of hardware IP obtained by the proposed calculation algorithm. We can use optimal hardware IPs selected by the proposed hardware IPs selection algorithm. The experimental results show effectiveness of our new design methodology.

  • ASP-DAC - A hardware/software partitioning algorithm for SIMD Processor Cores
    Proceedings of the 2003 conference on Asia South Pacific design automation - ASPDAC, 2003
    Co-Authors: Koichi Tachikake, Jinku Choi, Yuichiro Miyaoka, Masao Yanagisawa, Nozomu Togawa, Tatsuo Ohtsuki
    Abstract:

    This paper proposes a new hardware/software partitioning algorithm for Processor Cores with SIMD instructions. Given a compiled assembly code including SIMD instructions, a timing constraint of execution time, and available hardware units, the proposed algorithm synthesizes an area-optimized Processor Core with a new assembly code. Firstly, we assume an initial Processor Core on which an input assembly code can run with the shortest execution time. Secondly we reduce a hardware unit added to a Processor Core one by one while the timing constraint is satisfied. At the same time, we update the assembly code so that it can run on the new Processor configuration. By repeating this process, we finally obtain a Processor Core architecture with small area under the given timing constraint. We expect that vie can obtain a Processor Core which has appropriate SIMD functional units for running the input application program. The promising experimental results are also shown.

  • APCCAS (1) - An algorithm of hardware unit generation for Processor Core synthesis with packed SIMD type instructions
    Asia-Pacific Conference on Circuits and Systems, 2002
    Co-Authors: Yuichiro Miyaoka, Jinku Choi, Masao Yanagisawa, Nozomu Togawa, Tatsuo Ohtsuki
    Abstract:

    The authors consider the synthesis of a Processor Core with SIMD instructions by a hardware/software cosynthesis system. The system is required to configure functional units executing SIMD instructions and obtain the area and delay of the functional units to evaluate the synthesized Processor Core. This paper proposes a hardware unit generation algorithm for a hardware/software cosynthesis system of Processors with SIMD instructions. Given a set of instructions to be executed by a hardware unit and constraints for area and delay of the hardware unit, the proposed algorithm extracts a set of subfunctions to be required by the hardware unit and generates more than one architecture candidates for the hardware unit. The algorithm also outputs the estimated area and delay of each of the generated hardware units. The execution time of the proposed algorithm is very short and thus it can be easily incorporated into the Processor Core synthesis system. Experimental results demonstrate effectiveness and efficiency of the algorithm.

Masahiko Yoshimoto - One of the best experts on this subject based on the ideXlab platform.

  • a 95 mw mpeg2 mp hl motion estimation Processor Core for portable high resolution video application
    IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, 2005
    Co-Authors: Yuichiro Murachi, Koji Hamano, Tetsuro Matsuno, Junichi Miyakoshi, Masayuki Miyama, Masahiko Yoshimoto
    Abstract:

    This paper describes a 95 mW MPEG2 [email protected] motion estimation Processor Core for portable and high-resolution video applications such as that in an HD camcorder. It features a novel hierarchical algorithm and a low-power ring-connected systolic array architecture. It supports frame/field and bi-directional prediction with half-pel precision for 1920 × [email protected] fps resolution video. The search range is ±128 × ±64 pixels. The ME Core integrates 2.25 M transistors in 3.1 mm × 3.1 mm using 0.18-micron technology.

  • a 95mw mpeg2 mp hl motion estimation Processor Core for portable high resolution video application
    Symposium on VLSI Circuits, 2005
    Co-Authors: Yuichiro Murachi, Koji Hamano, Tetsuro Matsuno, Junichi Miyakoshi, Masayuki Miyama, Masahiko Yoshimoto
    Abstract:

    This paper describes a 95mW MPEG2 MP@HL motion estimation Processor Core for portable and high resolution video application like an HD camcorder. It features a novel hierarchical algorithm and a low power ring-connected systolic array architecture. It supports the frame/field and bi-directional prediction with half-pel precision for 1920/spl times/1080@30fps resolution video. The search range is /spl plusmn/128/spl times//spl plusmn/64. The ME Core integrates 2.25M transistors in 3.1mm/spl times/3.1mm using 0.18micron technology.

  • a sub mw mpeg 4 motion estimation Processor Core for mobile video application
    IEEE Journal of Solid-state Circuits, 2004
    Co-Authors: Masayuki Miyama, Junichi Miyakoshi, Y Kuroda, Kousuke Imamura, Hideo Hashimoto, Masahiko Yoshimoto
    Abstract:

    This paper describes a sub-mW motion estimation Processor Core for MPEG-4 video encoding. It features a gradient descent search (GDS) algorithm that reduces required computational complexity to 15 MOPS. The GDS algorithm combined with a sub-block search method upgrades picture quality. The quality is almost equal to that of a full search method. An SIMD datapath architecture optimized for the algorithm decreases a clock frequency and supply voltage. A dedicated three-port SRAM macro for image data caches of the Processor is newly designed to reduce power consumption. It has been fabricated with 0.18-/spl mu/m five-layer metal CMOS technology. The VLSI processing QCIF 15-f/s video consumes 0.4-mW power at 0.85-MHz clock frequency with 1.0-V supply voltage. It is applicable to mobile video applications.

Nozomu Togawa - One of the best experts on this subject based on the ideXlab platform.

  • ASP-DAC - An interface-circuit synthesis method with configurable Processor Core in IP-based SoC designs
    Proceedings of the 2006 conference on Asia South Pacific design automation - ASP-DAC '06, 2006
    Co-Authors: Shunitsu Kohara, Yuichiro Miyaoka, Masao Yanagisawa, Nozomu Togawa, Naoki Tomono, Jumpei Uchida, Tatsuo Ohtsuki
    Abstract:

    In SoC designs, efficient communication between the hardware IPs and the on-chip Processor becomes very important; however the interface is usually affected by the Processor Core specification. Thus in this paper, we focus on developing an efficient interface circuit architecture for the communications between the on-chip Processor and embedded hardware IP Cores, we also propose a method to synthesize it. Experimental results show that our method could obtain optimal interface circuits and works well through designing a MPEG-4 encode application.

  • Sub-operation Parallelism Optimization in SIMD Processor Core Synthesis
    IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, 2005
    Co-Authors: Hideki Kawazu, Yuichiro Miyaoka, Masao Yanagisawa, Nozomu Togawa, Jumpei Uchida, Tatsuo Ohtsuki
    Abstract:

    A b-bit SIMD functional unit has nk-bit sub-functional units in itself, where b = k × n. It can execute n-parallel k-bit operations. However, all the b-bit functional units in a Processor Core do not necessarily execute n-parallel operations. Depending on an application program, some of them just execute n/2-parallel operations or even n/4-parallel operations. This means that we can modify a b-bit SIMD functional unit so that it has n/2 k-bit sub-functional units or n/4 k-bit sub-functional units. The number of k-bit sub-functional units in a SIMD functional unit is called sub-operation parallelism. We incorporate a sub-operation parallelism optimization algorithm into SIMD functional unit optimization. Our proposed algorithm gradually reduces sub-operation parallelism of a SIMD functional unit while the timing constraint of execution time satisfied. Thereby, we can finally find a Processor Core with small area under the given timing constraint. We expect that we can obtain Processor Core configurations of smaller area in the same timing constraint rather than a conventional system. The promising experimental results are also shown.

  • ASP-DAC - A Processor Core synthesis system in IP-based SoC design
    Proceedings of the 2005 conference on Asia South Pacific design automation - ASP-DAC '05, 2005
    Co-Authors: Naoki Tomono, Yuichiro Miyaoka, Masao Yanagisawa, Nozomu Togawa, Shunitsu Kohara, Jumpei Uchida, Tatsuo Ohtsuki
    Abstract:

    This paper proposes a new design methodology for SoCs reusing hardware IPs. In our approach, after system-level HW/SW partitioning, we use IPs for hardware parts, but synthesize a new Processor Core instead of reusing a Processor Core IP. System performs efficient parallel execution of hardware and software by taking account of a response time of hardware IP obtained by the proposed calculation algorithm. We can use optimal hardware IPs selected by the proposed hardware IPs selection algorithm. The experimental results show effectiveness of our new design methodology.

  • ASP-DAC - A hardware/software partitioning algorithm for SIMD Processor Cores
    Proceedings of the 2003 conference on Asia South Pacific design automation - ASPDAC, 2003
    Co-Authors: Koichi Tachikake, Jinku Choi, Yuichiro Miyaoka, Masao Yanagisawa, Nozomu Togawa, Tatsuo Ohtsuki
    Abstract:

    This paper proposes a new hardware/software partitioning algorithm for Processor Cores with SIMD instructions. Given a compiled assembly code including SIMD instructions, a timing constraint of execution time, and available hardware units, the proposed algorithm synthesizes an area-optimized Processor Core with a new assembly code. Firstly, we assume an initial Processor Core on which an input assembly code can run with the shortest execution time. Secondly we reduce a hardware unit added to a Processor Core one by one while the timing constraint is satisfied. At the same time, we update the assembly code so that it can run on the new Processor configuration. By repeating this process, we finally obtain a Processor Core architecture with small area under the given timing constraint. We expect that vie can obtain a Processor Core which has appropriate SIMD functional units for running the input application program. The promising experimental results are also shown.

  • APCCAS (1) - An algorithm of hardware unit generation for Processor Core synthesis with packed SIMD type instructions
    Asia-Pacific Conference on Circuits and Systems, 2002
    Co-Authors: Yuichiro Miyaoka, Jinku Choi, Masao Yanagisawa, Nozomu Togawa, Tatsuo Ohtsuki
    Abstract:

    The authors consider the synthesis of a Processor Core with SIMD instructions by a hardware/software cosynthesis system. The system is required to configure functional units executing SIMD instructions and obtain the area and delay of the functional units to evaluate the synthesized Processor Core. This paper proposes a hardware unit generation algorithm for a hardware/software cosynthesis system of Processors with SIMD instructions. Given a set of instructions to be executed by a hardware unit and constraints for area and delay of the hardware unit, the proposed algorithm extracts a set of subfunctions to be required by the hardware unit and generates more than one architecture candidates for the hardware unit. The algorithm also outputs the estimated area and delay of each of the generated hardware units. The execution time of the proposed algorithm is very short and thus it can be easily incorporated into the Processor Core synthesis system. Experimental results demonstrate effectiveness and efficiency of the algorithm.

Masayuki Miyama - One of the best experts on this subject based on the ideXlab platform.

  • a 95 mw mpeg2 mp hl motion estimation Processor Core for portable high resolution video application
    IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, 2005
    Co-Authors: Yuichiro Murachi, Koji Hamano, Tetsuro Matsuno, Junichi Miyakoshi, Masayuki Miyama, Masahiko Yoshimoto
    Abstract:

    This paper describes a 95 mW MPEG2 [email protected] motion estimation Processor Core for portable and high-resolution video applications such as that in an HD camcorder. It features a novel hierarchical algorithm and a low-power ring-connected systolic array architecture. It supports frame/field and bi-directional prediction with half-pel precision for 1920 × [email protected] fps resolution video. The search range is ±128 × ±64 pixels. The ME Core integrates 2.25 M transistors in 3.1 mm × 3.1 mm using 0.18-micron technology.

  • a 95mw mpeg2 mp hl motion estimation Processor Core for portable high resolution video application
    Symposium on VLSI Circuits, 2005
    Co-Authors: Yuichiro Murachi, Koji Hamano, Tetsuro Matsuno, Junichi Miyakoshi, Masayuki Miyama, Masahiko Yoshimoto
    Abstract:

    This paper describes a 95mW MPEG2 MP@HL motion estimation Processor Core for portable and high resolution video application like an HD camcorder. It features a novel hierarchical algorithm and a low power ring-connected systolic array architecture. It supports the frame/field and bi-directional prediction with half-pel precision for 1920/spl times/1080@30fps resolution video. The search range is /spl plusmn/128/spl times//spl plusmn/64. The ME Core integrates 2.25M transistors in 3.1mm/spl times/3.1mm using 0.18micron technology.

  • a sub mw mpeg 4 motion estimation Processor Core for mobile video application
    IEEE Journal of Solid-state Circuits, 2004
    Co-Authors: Masayuki Miyama, Junichi Miyakoshi, Y Kuroda, Kousuke Imamura, Hideo Hashimoto, Masahiko Yoshimoto
    Abstract:

    This paper describes a sub-mW motion estimation Processor Core for MPEG-4 video encoding. It features a gradient descent search (GDS) algorithm that reduces required computational complexity to 15 MOPS. The GDS algorithm combined with a sub-block search method upgrades picture quality. The quality is almost equal to that of a full search method. An SIMD datapath architecture optimized for the algorithm decreases a clock frequency and supply voltage. A dedicated three-port SRAM macro for image data caches of the Processor is newly designed to reduce power consumption. It has been fabricated with 0.18-/spl mu/m five-layer metal CMOS technology. The VLSI processing QCIF 15-f/s video consumes 0.4-mW power at 0.85-MHz clock frequency with 1.0-V supply voltage. It is applicable to mobile video applications.

  • an ultra low power realtime mpeg2 mp hl motion estimation Processor Core with simd datapath architecture optimized for gradient descent search algorithm
    Custom Integrated Circuits Conference, 2002
    Co-Authors: Masayuki Miyama, O Tooyama, Naoki Takamatsu, Tsuyoshi Kodake, Kazuo Nakamura, A Kato, J Miyakoshi, K Hashimoto, Satoshi Komatsu, Mikio Yagi
    Abstract:

    This paper describes a motion estimation (ME) Processor Core for realtime, MP@HL video encoding. It is being fabricated with 0.13 /spl mu/m CMOS technology and contains approximately 7 M-transistors on 4.50 mm /spl times/ 3.35 mm area. The estimated power consumption is less than 100 mW at 81 MHz and 1.0 V. It features a gradient descent search (GDS) algorithm that drastically reduces the required computation power to 7 GOPS, an optimized SIMD datapath architecture that decreases the clock frequency and the operating voltage, and a low power 3-port data cache SRAM with a write-disturb-free cell array arrangement. The Core can be applicable to a portable HDTV codec system.

Junichi Miyakoshi - One of the best experts on this subject based on the ideXlab platform.

  • a 95 mw mpeg2 mp hl motion estimation Processor Core for portable high resolution video application
    IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, 2005
    Co-Authors: Yuichiro Murachi, Koji Hamano, Tetsuro Matsuno, Junichi Miyakoshi, Masayuki Miyama, Masahiko Yoshimoto
    Abstract:

    This paper describes a 95 mW MPEG2 [email protected] motion estimation Processor Core for portable and high-resolution video applications such as that in an HD camcorder. It features a novel hierarchical algorithm and a low-power ring-connected systolic array architecture. It supports frame/field and bi-directional prediction with half-pel precision for 1920 × [email protected] fps resolution video. The search range is ±128 × ±64 pixels. The ME Core integrates 2.25 M transistors in 3.1 mm × 3.1 mm using 0.18-micron technology.

  • a 95mw mpeg2 mp hl motion estimation Processor Core for portable high resolution video application
    Symposium on VLSI Circuits, 2005
    Co-Authors: Yuichiro Murachi, Koji Hamano, Tetsuro Matsuno, Junichi Miyakoshi, Masayuki Miyama, Masahiko Yoshimoto
    Abstract:

    This paper describes a 95mW MPEG2 MP@HL motion estimation Processor Core for portable and high resolution video application like an HD camcorder. It features a novel hierarchical algorithm and a low power ring-connected systolic array architecture. It supports the frame/field and bi-directional prediction with half-pel precision for 1920/spl times/1080@30fps resolution video. The search range is /spl plusmn/128/spl times//spl plusmn/64. The ME Core integrates 2.25M transistors in 3.1mm/spl times/3.1mm using 0.18micron technology.

  • a sub mw mpeg 4 motion estimation Processor Core for mobile video application
    IEEE Journal of Solid-state Circuits, 2004
    Co-Authors: Masayuki Miyama, Junichi Miyakoshi, Y Kuroda, Kousuke Imamura, Hideo Hashimoto, Masahiko Yoshimoto
    Abstract:

    This paper describes a sub-mW motion estimation Processor Core for MPEG-4 video encoding. It features a gradient descent search (GDS) algorithm that reduces required computational complexity to 15 MOPS. The GDS algorithm combined with a sub-block search method upgrades picture quality. The quality is almost equal to that of a full search method. An SIMD datapath architecture optimized for the algorithm decreases a clock frequency and supply voltage. A dedicated three-port SRAM macro for image data caches of the Processor is newly designed to reduce power consumption. It has been fabricated with 0.18-/spl mu/m five-layer metal CMOS technology. The VLSI processing QCIF 15-f/s video consumes 0.4-mW power at 0.85-MHz clock frequency with 1.0-V supply voltage. It is applicable to mobile video applications.