The Experts below are selected from a list of 33972 Experts worldwide ranked by ideXlab platform
Tatsuo Ohtsuki - One of the best experts on this subject based on the ideXlab platform.
-
ASP-DAC - An interface-circuit synthesis method with configurable Processor Core in IP-based SoC designs
Proceedings of the 2006 conference on Asia South Pacific design automation - ASP-DAC '06, 2006Co-Authors: Shunitsu Kohara, Yuichiro Miyaoka, Masao Yanagisawa, Nozomu Togawa, Naoki Tomono, Jumpei Uchida, Tatsuo OhtsukiAbstract:In SoC designs, efficient communication between the hardware IPs and the on-chip Processor becomes very important; however the interface is usually affected by the Processor Core specification. Thus in this paper, we focus on developing an efficient interface circuit architecture for the communications between the on-chip Processor and embedded hardware IP Cores, we also propose a method to synthesize it. Experimental results show that our method could obtain optimal interface circuits and works well through designing a MPEG-4 encode application.
-
Sub-operation Parallelism Optimization in SIMD Processor Core Synthesis
IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, 2005Co-Authors: Hideki Kawazu, Yuichiro Miyaoka, Masao Yanagisawa, Nozomu Togawa, Jumpei Uchida, Tatsuo OhtsukiAbstract:A b-bit SIMD functional unit has nk-bit sub-functional units in itself, where b = k × n. It can execute n-parallel k-bit operations. However, all the b-bit functional units in a Processor Core do not necessarily execute n-parallel operations. Depending on an application program, some of them just execute n/2-parallel operations or even n/4-parallel operations. This means that we can modify a b-bit SIMD functional unit so that it has n/2 k-bit sub-functional units or n/4 k-bit sub-functional units. The number of k-bit sub-functional units in a SIMD functional unit is called sub-operation parallelism. We incorporate a sub-operation parallelism optimization algorithm into SIMD functional unit optimization. Our proposed algorithm gradually reduces sub-operation parallelism of a SIMD functional unit while the timing constraint of execution time satisfied. Thereby, we can finally find a Processor Core with small area under the given timing constraint. We expect that we can obtain Processor Core configurations of smaller area in the same timing constraint rather than a conventional system. The promising experimental results are also shown.
-
ASP-DAC - A Processor Core synthesis system in IP-based SoC design
Proceedings of the 2005 conference on Asia South Pacific design automation - ASP-DAC '05, 2005Co-Authors: Naoki Tomono, Yuichiro Miyaoka, Masao Yanagisawa, Nozomu Togawa, Shunitsu Kohara, Jumpei Uchida, Tatsuo OhtsukiAbstract:This paper proposes a new design methodology for SoCs reusing hardware IPs. In our approach, after system-level HW/SW partitioning, we use IPs for hardware parts, but synthesize a new Processor Core instead of reusing a Processor Core IP. System performs efficient parallel execution of hardware and software by taking account of a response time of hardware IP obtained by the proposed calculation algorithm. We can use optimal hardware IPs selected by the proposed hardware IPs selection algorithm. The experimental results show effectiveness of our new design methodology.
-
ASP-DAC - A hardware/software partitioning algorithm for SIMD Processor Cores
Proceedings of the 2003 conference on Asia South Pacific design automation - ASPDAC, 2003Co-Authors: Koichi Tachikake, Jinku Choi, Yuichiro Miyaoka, Masao Yanagisawa, Nozomu Togawa, Tatsuo OhtsukiAbstract:This paper proposes a new hardware/software partitioning algorithm for Processor Cores with SIMD instructions. Given a compiled assembly code including SIMD instructions, a timing constraint of execution time, and available hardware units, the proposed algorithm synthesizes an area-optimized Processor Core with a new assembly code. Firstly, we assume an initial Processor Core on which an input assembly code can run with the shortest execution time. Secondly we reduce a hardware unit added to a Processor Core one by one while the timing constraint is satisfied. At the same time, we update the assembly code so that it can run on the new Processor configuration. By repeating this process, we finally obtain a Processor Core architecture with small area under the given timing constraint. We expect that vie can obtain a Processor Core which has appropriate SIMD functional units for running the input application program. The promising experimental results are also shown.
-
APCCAS (1) - An algorithm of hardware unit generation for Processor Core synthesis with packed SIMD type instructions
Asia-Pacific Conference on Circuits and Systems, 2002Co-Authors: Yuichiro Miyaoka, Jinku Choi, Masao Yanagisawa, Nozomu Togawa, Tatsuo OhtsukiAbstract:The authors consider the synthesis of a Processor Core with SIMD instructions by a hardware/software cosynthesis system. The system is required to configure functional units executing SIMD instructions and obtain the area and delay of the functional units to evaluate the synthesized Processor Core. This paper proposes a hardware unit generation algorithm for a hardware/software cosynthesis system of Processors with SIMD instructions. Given a set of instructions to be executed by a hardware unit and constraints for area and delay of the hardware unit, the proposed algorithm extracts a set of subfunctions to be required by the hardware unit and generates more than one architecture candidates for the hardware unit. The algorithm also outputs the estimated area and delay of each of the generated hardware units. The execution time of the proposed algorithm is very short and thus it can be easily incorporated into the Processor Core synthesis system. Experimental results demonstrate effectiveness and efficiency of the algorithm.
Masahiko Yoshimoto - One of the best experts on this subject based on the ideXlab platform.
-
a 95 mw mpeg2 mp hl motion estimation Processor Core for portable high resolution video application
IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, 2005Co-Authors: Yuichiro Murachi, Koji Hamano, Tetsuro Matsuno, Junichi Miyakoshi, Masayuki Miyama, Masahiko YoshimotoAbstract:This paper describes a 95 mW MPEG2 [email protected] motion estimation Processor Core for portable and high-resolution video applications such as that in an HD camcorder. It features a novel hierarchical algorithm and a low-power ring-connected systolic array architecture. It supports frame/field and bi-directional prediction with half-pel precision for 1920 × [email protected] fps resolution video. The search range is ±128 × ±64 pixels. The ME Core integrates 2.25 M transistors in 3.1 mm × 3.1 mm using 0.18-micron technology.
-
a 95mw mpeg2 mp hl motion estimation Processor Core for portable high resolution video application
Symposium on VLSI Circuits, 2005Co-Authors: Yuichiro Murachi, Koji Hamano, Tetsuro Matsuno, Junichi Miyakoshi, Masayuki Miyama, Masahiko YoshimotoAbstract:This paper describes a 95mW MPEG2 MP@HL motion estimation Processor Core for portable and high resolution video application like an HD camcorder. It features a novel hierarchical algorithm and a low power ring-connected systolic array architecture. It supports the frame/field and bi-directional prediction with half-pel precision for 1920/spl times/1080@30fps resolution video. The search range is /spl plusmn/128/spl times//spl plusmn/64. The ME Core integrates 2.25M transistors in 3.1mm/spl times/3.1mm using 0.18micron technology.
-
a sub mw mpeg 4 motion estimation Processor Core for mobile video application
IEEE Journal of Solid-state Circuits, 2004Co-Authors: Masayuki Miyama, Junichi Miyakoshi, Y Kuroda, Kousuke Imamura, Hideo Hashimoto, Masahiko YoshimotoAbstract:This paper describes a sub-mW motion estimation Processor Core for MPEG-4 video encoding. It features a gradient descent search (GDS) algorithm that reduces required computational complexity to 15 MOPS. The GDS algorithm combined with a sub-block search method upgrades picture quality. The quality is almost equal to that of a full search method. An SIMD datapath architecture optimized for the algorithm decreases a clock frequency and supply voltage. A dedicated three-port SRAM macro for image data caches of the Processor is newly designed to reduce power consumption. It has been fabricated with 0.18-/spl mu/m five-layer metal CMOS technology. The VLSI processing QCIF 15-f/s video consumes 0.4-mW power at 0.85-MHz clock frequency with 1.0-V supply voltage. It is applicable to mobile video applications.
Nozomu Togawa - One of the best experts on this subject based on the ideXlab platform.
-
ASP-DAC - An interface-circuit synthesis method with configurable Processor Core in IP-based SoC designs
Proceedings of the 2006 conference on Asia South Pacific design automation - ASP-DAC '06, 2006Co-Authors: Shunitsu Kohara, Yuichiro Miyaoka, Masao Yanagisawa, Nozomu Togawa, Naoki Tomono, Jumpei Uchida, Tatsuo OhtsukiAbstract:In SoC designs, efficient communication between the hardware IPs and the on-chip Processor becomes very important; however the interface is usually affected by the Processor Core specification. Thus in this paper, we focus on developing an efficient interface circuit architecture for the communications between the on-chip Processor and embedded hardware IP Cores, we also propose a method to synthesize it. Experimental results show that our method could obtain optimal interface circuits and works well through designing a MPEG-4 encode application.
-
Sub-operation Parallelism Optimization in SIMD Processor Core Synthesis
IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, 2005Co-Authors: Hideki Kawazu, Yuichiro Miyaoka, Masao Yanagisawa, Nozomu Togawa, Jumpei Uchida, Tatsuo OhtsukiAbstract:A b-bit SIMD functional unit has nk-bit sub-functional units in itself, where b = k × n. It can execute n-parallel k-bit operations. However, all the b-bit functional units in a Processor Core do not necessarily execute n-parallel operations. Depending on an application program, some of them just execute n/2-parallel operations or even n/4-parallel operations. This means that we can modify a b-bit SIMD functional unit so that it has n/2 k-bit sub-functional units or n/4 k-bit sub-functional units. The number of k-bit sub-functional units in a SIMD functional unit is called sub-operation parallelism. We incorporate a sub-operation parallelism optimization algorithm into SIMD functional unit optimization. Our proposed algorithm gradually reduces sub-operation parallelism of a SIMD functional unit while the timing constraint of execution time satisfied. Thereby, we can finally find a Processor Core with small area under the given timing constraint. We expect that we can obtain Processor Core configurations of smaller area in the same timing constraint rather than a conventional system. The promising experimental results are also shown.
-
ASP-DAC - A Processor Core synthesis system in IP-based SoC design
Proceedings of the 2005 conference on Asia South Pacific design automation - ASP-DAC '05, 2005Co-Authors: Naoki Tomono, Yuichiro Miyaoka, Masao Yanagisawa, Nozomu Togawa, Shunitsu Kohara, Jumpei Uchida, Tatsuo OhtsukiAbstract:This paper proposes a new design methodology for SoCs reusing hardware IPs. In our approach, after system-level HW/SW partitioning, we use IPs for hardware parts, but synthesize a new Processor Core instead of reusing a Processor Core IP. System performs efficient parallel execution of hardware and software by taking account of a response time of hardware IP obtained by the proposed calculation algorithm. We can use optimal hardware IPs selected by the proposed hardware IPs selection algorithm. The experimental results show effectiveness of our new design methodology.
-
ASP-DAC - A hardware/software partitioning algorithm for SIMD Processor Cores
Proceedings of the 2003 conference on Asia South Pacific design automation - ASPDAC, 2003Co-Authors: Koichi Tachikake, Jinku Choi, Yuichiro Miyaoka, Masao Yanagisawa, Nozomu Togawa, Tatsuo OhtsukiAbstract:This paper proposes a new hardware/software partitioning algorithm for Processor Cores with SIMD instructions. Given a compiled assembly code including SIMD instructions, a timing constraint of execution time, and available hardware units, the proposed algorithm synthesizes an area-optimized Processor Core with a new assembly code. Firstly, we assume an initial Processor Core on which an input assembly code can run with the shortest execution time. Secondly we reduce a hardware unit added to a Processor Core one by one while the timing constraint is satisfied. At the same time, we update the assembly code so that it can run on the new Processor configuration. By repeating this process, we finally obtain a Processor Core architecture with small area under the given timing constraint. We expect that vie can obtain a Processor Core which has appropriate SIMD functional units for running the input application program. The promising experimental results are also shown.
-
APCCAS (1) - An algorithm of hardware unit generation for Processor Core synthesis with packed SIMD type instructions
Asia-Pacific Conference on Circuits and Systems, 2002Co-Authors: Yuichiro Miyaoka, Jinku Choi, Masao Yanagisawa, Nozomu Togawa, Tatsuo OhtsukiAbstract:The authors consider the synthesis of a Processor Core with SIMD instructions by a hardware/software cosynthesis system. The system is required to configure functional units executing SIMD instructions and obtain the area and delay of the functional units to evaluate the synthesized Processor Core. This paper proposes a hardware unit generation algorithm for a hardware/software cosynthesis system of Processors with SIMD instructions. Given a set of instructions to be executed by a hardware unit and constraints for area and delay of the hardware unit, the proposed algorithm extracts a set of subfunctions to be required by the hardware unit and generates more than one architecture candidates for the hardware unit. The algorithm also outputs the estimated area and delay of each of the generated hardware units. The execution time of the proposed algorithm is very short and thus it can be easily incorporated into the Processor Core synthesis system. Experimental results demonstrate effectiveness and efficiency of the algorithm.
Masayuki Miyama - One of the best experts on this subject based on the ideXlab platform.
-
a 95 mw mpeg2 mp hl motion estimation Processor Core for portable high resolution video application
IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, 2005Co-Authors: Yuichiro Murachi, Koji Hamano, Tetsuro Matsuno, Junichi Miyakoshi, Masayuki Miyama, Masahiko YoshimotoAbstract:This paper describes a 95 mW MPEG2 [email protected] motion estimation Processor Core for portable and high-resolution video applications such as that in an HD camcorder. It features a novel hierarchical algorithm and a low-power ring-connected systolic array architecture. It supports frame/field and bi-directional prediction with half-pel precision for 1920 × [email protected] fps resolution video. The search range is ±128 × ±64 pixels. The ME Core integrates 2.25 M transistors in 3.1 mm × 3.1 mm using 0.18-micron technology.
-
a 95mw mpeg2 mp hl motion estimation Processor Core for portable high resolution video application
Symposium on VLSI Circuits, 2005Co-Authors: Yuichiro Murachi, Koji Hamano, Tetsuro Matsuno, Junichi Miyakoshi, Masayuki Miyama, Masahiko YoshimotoAbstract:This paper describes a 95mW MPEG2 MP@HL motion estimation Processor Core for portable and high resolution video application like an HD camcorder. It features a novel hierarchical algorithm and a low power ring-connected systolic array architecture. It supports the frame/field and bi-directional prediction with half-pel precision for 1920/spl times/1080@30fps resolution video. The search range is /spl plusmn/128/spl times//spl plusmn/64. The ME Core integrates 2.25M transistors in 3.1mm/spl times/3.1mm using 0.18micron technology.
-
a sub mw mpeg 4 motion estimation Processor Core for mobile video application
IEEE Journal of Solid-state Circuits, 2004Co-Authors: Masayuki Miyama, Junichi Miyakoshi, Y Kuroda, Kousuke Imamura, Hideo Hashimoto, Masahiko YoshimotoAbstract:This paper describes a sub-mW motion estimation Processor Core for MPEG-4 video encoding. It features a gradient descent search (GDS) algorithm that reduces required computational complexity to 15 MOPS. The GDS algorithm combined with a sub-block search method upgrades picture quality. The quality is almost equal to that of a full search method. An SIMD datapath architecture optimized for the algorithm decreases a clock frequency and supply voltage. A dedicated three-port SRAM macro for image data caches of the Processor is newly designed to reduce power consumption. It has been fabricated with 0.18-/spl mu/m five-layer metal CMOS technology. The VLSI processing QCIF 15-f/s video consumes 0.4-mW power at 0.85-MHz clock frequency with 1.0-V supply voltage. It is applicable to mobile video applications.
-
an ultra low power realtime mpeg2 mp hl motion estimation Processor Core with simd datapath architecture optimized for gradient descent search algorithm
Custom Integrated Circuits Conference, 2002Co-Authors: Masayuki Miyama, O Tooyama, Naoki Takamatsu, Tsuyoshi Kodake, Kazuo Nakamura, A Kato, J Miyakoshi, K Hashimoto, Satoshi Komatsu, Mikio YagiAbstract:This paper describes a motion estimation (ME) Processor Core for realtime, MP@HL video encoding. It is being fabricated with 0.13 /spl mu/m CMOS technology and contains approximately 7 M-transistors on 4.50 mm /spl times/ 3.35 mm area. The estimated power consumption is less than 100 mW at 81 MHz and 1.0 V. It features a gradient descent search (GDS) algorithm that drastically reduces the required computation power to 7 GOPS, an optimized SIMD datapath architecture that decreases the clock frequency and the operating voltage, and a low power 3-port data cache SRAM with a write-disturb-free cell array arrangement. The Core can be applicable to a portable HDTV codec system.
Junichi Miyakoshi - One of the best experts on this subject based on the ideXlab platform.
-
a 95 mw mpeg2 mp hl motion estimation Processor Core for portable high resolution video application
IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, 2005Co-Authors: Yuichiro Murachi, Koji Hamano, Tetsuro Matsuno, Junichi Miyakoshi, Masayuki Miyama, Masahiko YoshimotoAbstract:This paper describes a 95 mW MPEG2 [email protected] motion estimation Processor Core for portable and high-resolution video applications such as that in an HD camcorder. It features a novel hierarchical algorithm and a low-power ring-connected systolic array architecture. It supports frame/field and bi-directional prediction with half-pel precision for 1920 × [email protected] fps resolution video. The search range is ±128 × ±64 pixels. The ME Core integrates 2.25 M transistors in 3.1 mm × 3.1 mm using 0.18-micron technology.
-
a 95mw mpeg2 mp hl motion estimation Processor Core for portable high resolution video application
Symposium on VLSI Circuits, 2005Co-Authors: Yuichiro Murachi, Koji Hamano, Tetsuro Matsuno, Junichi Miyakoshi, Masayuki Miyama, Masahiko YoshimotoAbstract:This paper describes a 95mW MPEG2 MP@HL motion estimation Processor Core for portable and high resolution video application like an HD camcorder. It features a novel hierarchical algorithm and a low power ring-connected systolic array architecture. It supports the frame/field and bi-directional prediction with half-pel precision for 1920/spl times/1080@30fps resolution video. The search range is /spl plusmn/128/spl times//spl plusmn/64. The ME Core integrates 2.25M transistors in 3.1mm/spl times/3.1mm using 0.18micron technology.
-
a sub mw mpeg 4 motion estimation Processor Core for mobile video application
IEEE Journal of Solid-state Circuits, 2004Co-Authors: Masayuki Miyama, Junichi Miyakoshi, Y Kuroda, Kousuke Imamura, Hideo Hashimoto, Masahiko YoshimotoAbstract:This paper describes a sub-mW motion estimation Processor Core for MPEG-4 video encoding. It features a gradient descent search (GDS) algorithm that reduces required computational complexity to 15 MOPS. The GDS algorithm combined with a sub-block search method upgrades picture quality. The quality is almost equal to that of a full search method. An SIMD datapath architecture optimized for the algorithm decreases a clock frequency and supply voltage. A dedicated three-port SRAM macro for image data caches of the Processor is newly designed to reduce power consumption. It has been fabricated with 0.18-/spl mu/m five-layer metal CMOS technology. The VLSI processing QCIF 15-f/s video consumes 0.4-mW power at 0.85-MHz clock frequency with 1.0-V supply voltage. It is applicable to mobile video applications.