Inductive Coupling

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 8475 Experts worldwide ranked by ideXlab platform

Tadahiro Kuroda - One of the best experts on this subject based on the ideXlab platform.

  • a 96 mb 3d stacked sram using Inductive Coupling with 0 4 v transmitter termination scheme and 12 1 serdes in 40 nm cmos
    IEEE Transactions on Circuits and Systems I-regular Papers, 2021
    Co-Authors: Kota Shiba, Tatsuo Omori, Shinya Takamaedayamazaki, Kodai Ueyoshi, Masato Motomura, Mototsugu Hamada, Tadahiro Kuroda
    Abstract:

    A 28.8-GB/s 96-MB 3D-stacked SRAM is presented. A total of eight SRAM dies, designed in a 40-nm CMOS process, are vertically stacked and connected using an Inductive Coupling wireless link with a low-voltage NMOS push-pull transmitter that reduces the power of the link by 35% with a 0.4-V power supply. The SRAM utilizes an inverted bit insertion scheme that compensates for the degradation of the first transmitted bit, a coil termination scheme that aims to eliminate the ringing of 3D Inductive Coupling bus, and a 12:1 SerDes that minimizes power consumption and area overhead in Inductive Coupling channels. Low-power, large-capacity, 3-cycle latency 3D-stacked SRAM for a DNN accelerator is achieved with the combination of these techniques to serve as a replacement of 3D-stacked DRAM. The performance of the proposed 3D-SRAM is compared with HBM DRAM and achieves more than 50% lower energy consumption. The scaling scenario of the SRAM module is discussed in light of the scaling of the Inductive Coupling technology and logic process.

  • An Inductive-Coupling Inter-Chip Link for High-Performance and Low-Power 3D System Integration
    'IntechOpen', 2021
    Co-Authors: Kiichi Niitsu, Tadahiro Kuroda
    Abstract:

    This chapter presents the fundamental investigation and application of an InductiveCoupling link. First, the interference from power/signal lines and to SRAM of an Inductive-Coupling link was investigated. Measurement result shows that influence from line and space (I) is none and required normalized transmit power is 1.10 (line and space, type II) and 1.27 (mesh type) when metal density is 16%. The line and space type of power line is better for th

  • a 3d stacked sram using Inductive Coupling with low voltage transmitter and 12 1 serdes
    International Symposium on Circuits and Systems, 2020
    Co-Authors: Kota Shiba, Tatsuo Omori, Shinya Takamaedayamazaki, Kodai Ueyoshi, Kota Ando, Kazutoshi Hirose, Masato Motomura, Mototsugu Hamada, Tadahiro Kuroda
    Abstract:

    A 28.8-GB/s 96-MB 3D-stacked SRAM is presented. A total of eight SRAM dies, designed in a 40-nm CMOS process, are vertically stacked and connected using an Inductive Coupling wireless link with a low-voltage NMOS push-pull transmitter that reduces the power of the link by 45% with a 0.4-V power supply. The SRAM utilizes an inverted bit insertion scheme that compensates the degradation of the first signal, a coil termination scheme that aims to eliminate the noise of 3D Inductive Coupling bus, and a 12:1 SerDes. The data density of the SRAM should reach 12.3-MB/mm3, which extends beyond that of state-of-the-art stacked DRAMs.

  • QUEST: Multi-Purpose Log-Quantized DNN Inference Engine Stacked on 96-MB 3-D SRAM Using Inductive Coupling Technology in 40-nm CMOS
    IEEE Journal of Solid-State Circuits, 2019
    Co-Authors: Kodai Ueyoshi, Tadahiro Kuroda, Kota Ando, Kazutoshi Hirose, Mototsugu Hamada, Shinya Takamaeda-yamazaki, Masato Motomura
    Abstract:

    QUEST is a programmable multiple instruction, multiple data (MIMD) parallel accelerator for general-purpose state-of-the-art deep neural networks (DNNs). It features die-to-die stacking with three-cycle latency, 28.8 GB/s, 96 MB, and eight SRAMs using an Inductive Coupling technology called the ThruChip interface (TCI). By stacking the SRAMs instead of DRAMs, lower memory access latency and simpler hardware are expected. This facilitates in balancing the memory capacity, latency, and bandwidth, all of which are in demand by cutting-edge DNNs at a high level. QUEST also introduces log-quantized programmable bit-precision processing for achieving faster (larger) DNN computation (size) in a 3-D module. It can sustain higher recognition accuracy at a lower bitwidth region compared to linear quantization. The prototype QUEST chip is integrated in the 40-nm CMOS technology, and it achieves 7.49 tera operations per second (TOPS) peak performance in binary precision, and 1.96 TOPS in 4-bit precision at 300-MHz clock.

  • building block multi chip systems using Inductive Coupling through chip interface
    International SoC Design Conference, 2017
    Co-Authors: Hideharu Amano, Tadahiro Kuroda, Hiroki Matsutani, Hiroshi Nakamura, Kimiyoshi Usami, Masaaki Kondo, Mitaro Namiki
    Abstract:

    A building block computing system is consisting of multiple chips connecting with Inductive Coupling wireless through chip interconnect. Like building Lego blocks, various types of systems can be built by stacking different types of chips. In order to develop such systems, several techniques are investigated in the project: that is, Inductive Coupling wireless through chip interface, low power circuit technologies, autonomous interconnection network architectures, thermal dissipation and building block operating system. Here, the overview of the project and a prototype system are introduced.

Noriyuki Miura - One of the best experts on this subject based on the ideXlab platform.

  • a scalable 3d heterogeneous multi core processor with Inductive Coupling thruchip interface
    IEEE Hot Chips Symposium, 2013
    Co-Authors: Noriyuki Miura, Yasuhiro Take, Tadahiro Kuroda, Hiroki Matsutani, Hideharu Amano, Mitaro Namiki, Yusuke Koizumi, Eiichi Sasaki, Ryuichi Sakamoto, Kimiyoshi Usami
    Abstract:

    Recent battery driven IT devices including smart phone and tablets require versatile functions and high performance with low energy consumption. On the other hand, the initial cost of LSI for design and mask development has increased rapidly, and development of an SoC (System-on-a Chip) for each product has become difficult. Although flexible reconfigurable architectures can be a solution, the performance scalability is also necessary to cope with the wide performance range of products. As a solution, heterogeneous multi-core system using a 3-D wireless Inductive Coupling interconnect is proposed. This system consists of a MIPS-R3000 compatible embedded CPU and reconfigurable accelerators. Since chips are connected with wireless Inductive Coupling channels, the number and types of accelerators can be tailored easily depending on the requirement of the product.

  • a scalable 3d heterogeneous multi core processor with Inductive Coupling thruchip interface
    IEEE Micro, 2013
    Co-Authors: Noriyuki Miura, Yasuhiro Take, Tadahiro Kuroda, Hiroki Matsutani, Hideharu Amano, Mitaro Namiki, Yusuke Koizumi, Eiichi Sasaki, Ryuichi Sakamoto, Kimiyoshi Usami
    Abstract:

    A scalable heterogeneous multi-core processor is developed. 3D heterogeneous chip stacking of a general-purpose CPU and reconfigurable multi-core accelerators improves computational energy efficiency by proper task assignment and massive parallel computing. The stacked chips interconnect through a scalable 3D Network on Chip (NoC). By simply changing the number of stacked accelerator chips, processor parallelism can be widely scaled. In combination with Dynamic Voltage and Frequency Scaling (DVFS), the energy efficiency can be optimized for various performance requirements. No design change is needed, and hence no additional Non-Recurring Engineering (NRE) cost. An Inductive-Coupling ThruChip Interface (TCI) is applied to stacked-chip communications, forming a low-cost and robust high-speed 3D NoC. A prototype demonstration system has been developed with 65nm CMOS test chips. Successful system operations including 10-hours continuous Linux OS operation are confirmed for the first time.

  • a 0 025 0 45 w 60 efficiency Inductive Coupling power transceiver with 5 bit dual frequency feedforward control for non contact memory cards
    IEEE Journal of Solid-state Circuits, 2012
    Co-Authors: Hayun Chung, Noriyuki Miura, Hiroki Ishikuro, A Radecki, Tadahiro Kuroda
    Abstract:

    A 0.025-0.45 W Inductive-Coupling power transceiver for non-contact memory applications is presented. To deal with sudden and large load variations and achieve high-efficiency, we propose a power transceiver with 5-bit feedforward control. Knowing that load patterns of a memory card have strong correlation with commands issued by a host, feedforward control is applied to minimize response times. To achieve 5-bit power levels, the proposed transceiver utilizes pulse-density modulation (PDM) and a multi-channel structure. Different operation frequencies are chosen for each channel to maximize power transfer efficiency. To further improve transceiver efficiency and enable high-speed operation, an active rectifier with a fast positive feedback is proposed. The test prototype demonstrates 40%-70% efficiency across all load conditions and 60% efficiency in average, which are over an order of magnitude improvements compared to prior arts.

  • a 65fj b inter chip Inductive Coupling data transceivers using charge recycling technique for low power inter chip communication in 3 d system integration
    IEEE Transactions on Very Large Scale Integration Systems, 2012
    Co-Authors: Kiichi Niitsu, Noriyuki Miura, Hiroki Ishikuro, Shusuke Kawai, Tadahiro Kuroda
    Abstract:

    This paper presents a low-power Inductive-Coupling link in 90-nm CMOS. Our newly proposed transmitter circuit uses a charge-recycling technique for power-aware 3-D system integration. The cross-type daisy chain enables charge recycling and achieves power reduction without sacrificing communication performance such as a high timing margin, low bit error rate and high bandwidth. There are two design issues in the cross-type daisy chain: pulse amplitude reduction and another is inter-channel skew. To compensate for these issues, an inductor design and a replica circuit are proposed and investigated. Test chips were designed and fabricated in 90-nm CMOS to verify the validity of the proposed transmitter. Measurements revealed that the proposed cross-type daisy chain transmitter achieved an energy efficiency of 65 fJ/bit without degrading the timing margin, data rate, or bit error rate. In order to investigate the compatibility of the transmitter with technology scaling, a simulation of each technology node was performed. The simulation results indicate that the energy dissipation can be potentially reduced to less than 10 fJ/bit in 22 nm CMOS with proposed cross-type daisy chain.

  • a 30 gb s link 2 2 tb s mm 2 Inductively coupled injection locking cdr for high speed dram interface
    IEEE Journal of Solid-state Circuits, 2011
    Co-Authors: Yasuhiro Take, Noriyuki Miura, Tadahiro Kuroda
    Abstract:

    This paper presents a 30 Gb/s/link 2.2 Tb/s/mm2 Inductive-Coupling link for a high-speed DRAM interface. The data rate per layout area is the highest among DRAM interfaces reported up to now. The proposed interface employs a high-speed injection-locking CDR technique that utilizes the derivative property of Inductive Coupling. Compared to conventional injection-locking CDR based on an XOR edge detector, the proposed technique doubles the operation speed and increases the data rate to 30 Gb/s/link. As a result, the data rate per layout area is increased to 2.2 Tb/s/mm2 , which is 2X that of the state-of-the-art Inductive-Coupling link, and 22X that of the state-of-the-art wired link.

Hiroki Ishikuro - One of the best experts on this subject based on the ideXlab platform.

  • a 0 025 0 45 w 60 efficiency Inductive Coupling power transceiver with 5 bit dual frequency feedforward control for non contact memory cards
    IEEE Journal of Solid-state Circuits, 2012
    Co-Authors: Hayun Chung, Noriyuki Miura, Hiroki Ishikuro, A Radecki, Tadahiro Kuroda
    Abstract:

    A 0.025-0.45 W Inductive-Coupling power transceiver for non-contact memory applications is presented. To deal with sudden and large load variations and achieve high-efficiency, we propose a power transceiver with 5-bit feedforward control. Knowing that load patterns of a memory card have strong correlation with commands issued by a host, feedforward control is applied to minimize response times. To achieve 5-bit power levels, the proposed transceiver utilizes pulse-density modulation (PDM) and a multi-channel structure. Different operation frequencies are chosen for each channel to maximize power transfer efficiency. To further improve transceiver efficiency and enable high-speed operation, an active rectifier with a fast positive feedback is proposed. The test prototype demonstrates 40%-70% efficiency across all load conditions and 60% efficiency in average, which are over an order of magnitude improvements compared to prior arts.

  • a 65fj b inter chip Inductive Coupling data transceivers using charge recycling technique for low power inter chip communication in 3 d system integration
    IEEE Transactions on Very Large Scale Integration Systems, 2012
    Co-Authors: Kiichi Niitsu, Noriyuki Miura, Hiroki Ishikuro, Shusuke Kawai, Tadahiro Kuroda
    Abstract:

    This paper presents a low-power Inductive-Coupling link in 90-nm CMOS. Our newly proposed transmitter circuit uses a charge-recycling technique for power-aware 3-D system integration. The cross-type daisy chain enables charge recycling and achieves power reduction without sacrificing communication performance such as a high timing margin, low bit error rate and high bandwidth. There are two design issues in the cross-type daisy chain: pulse amplitude reduction and another is inter-channel skew. To compensate for these issues, an inductor design and a replica circuit are proposed and investigated. Test chips were designed and fabricated in 90-nm CMOS to verify the validity of the proposed transmitter. Measurements revealed that the proposed cross-type daisy chain transmitter achieved an energy efficiency of 65 fJ/bit without degrading the timing margin, data rate, or bit error rate. In order to investigate the compatibility of the transmitter with technology scaling, a simulation of each technology node was performed. The simulation results indicate that the energy dissipation can be potentially reduced to less than 10 fJ/bit in 22 nm CMOS with proposed cross-type daisy chain.

  • wireless proximity interfaces with a pulse based Inductive Coupling technique
    IEEE Communications Magazine, 2010
    Co-Authors: Hiroki Ishikuro, Tadahiro Kuroda
    Abstract:

    The rapid performance progress in processors and memory cores by technology scaling requires further improvement in interface bandwidth. However, interface bandwidth is not keeping up with the processing speed of the core and is becoming a bottleneck in system performance. To fill the performance gap, wideband low power low-cost interfaces are strongly demanded. A wireless proximity interface that uses Inductive Coupling is one such interface expected to be used for interchip links in high-performance 3D system integration. Inductive Coupling interfaces use the magnetic near-field induced by micro-coils. The coils (channels) can be arranged in a dense array because magnetic near-field localizes in the proximity of each coil, and crosstalk between the channels is small. Therefore, Inductive Coupling interfaces are suitable for wideband low-cost proximity communication. An Inductive Coupling interface can also realize highly reliable communication with low power consumption. Evaluation systems developed to study the performance of Inductive Coupling interfaces have demonstrated the feasibility of the interfaces in a wide range of applications.

  • a 2 5gb s ch 4pam Inductive Coupling transceiver for non contact memory card
    International Solid-State Circuits Conference, 2010
    Co-Authors: Shusuke Kawai, Hiroki Ishikuro, Tadahiro Kuroda
    Abstract:

    An Inductive-Coupling link has been studied for inter-chip communications in System-in-a-Package [1]. Its communication distance extends millimeter ranges [2,3] and it can be used as a wireless interface for non-contact memory cards. High speed and low power communication can be performed in the Inductive-Coupling link because of the removal of highly capacitive ESD protection devices [1]. The wireless interface eliminates mechanical contacts resulting in high reliability. Target data rate is 2.5Gb/s/ch which is 12.5x higher than that of a commercial memory card and target communication range is 0.5mm to 1mm, considering the allowance of card insertion. The maximum data rate of the Inductive-Coupling link demonstrated in [3] at 1mm distance was 160Mb/s/ch. A theoretical limit is 1Gb/s/ch since self resonant frequency of an on-chip inductor of 3GHz. To increase the self resonant frequency, the inductor is moved off chip to a flexible circuit board to reduce parasitic capacitance. The self resonant frequency of 1mm diameter inductor is increased to 4GHz corresponding to a signal data rate of 1.25Gb/s/ch. Additionally, the number of bit per symbol is increased to 2 by 4 pulse amplitude modulation (4PAM) and a data rate of 2.5Gb/s/ch is achieved. But to communicate by using 4PAM in the Inductive-Coupling link, issues listed below must be solved. First, the communication range is limited to 0.95mm to 1mm. The amplitude of the received signal is inversely proportional to the communication distance, and therefore, received signal cannot be converted to a correct data without adjusting the input threshold voltages of a receiver. Second, the pulse width is narrower in 4PAM and thus synchronization on the receiver side is difficult.

  • 2 gb s 15 pj b chip Inductive Coupling programmable bus for nand flash memory stacking
    International Solid-State Circuits Conference, 2009
    Co-Authors: Mitsuko Saito, Noriyuki Miura, Hiroki Ishikuro, Yoshinori Kohama, Yasufumi Sugimori, Yoichi Yoshida, T Sakurai, Tadahiro Kuroda
    Abstract:

    A wireless communication technique, which enables a controller chip to communicate with random access with a stack underneath it of 64 NAND Flash memory chips at a data rate of 2Gb/s using relayed transmission is developed (Fig. 13.5.1). This technique can be applied to memory access in solid-state drives (SSD). The wireless interface allows the removal of a highly capacitive ESD protection device and results in a 2× reduction in power consumption, and a 40× reduction in I/O circuit-layout area. Using bonding wires for the power supply and wireless interface for data access reduces the number of bonding wires in the 64-chip stack from over 1,500 wires to less than 200 wires. This reduction in the number of bonding wires makes it possible to integrate 64 chips in one package, which conventionally requires eight separate packages. This wireless interface is based on Inductive Coupling between inductors on the stacked chips. The inductors emit magnetic field both upwards and downwards. This creates both intentional and unintentional communication link, which makes it difficult to be used in homogeneous stacking. Our technique enables data delivery upwards and downwards for memory read and write with measured BER ≪ 10–12. Power reduction is achieved by proper state programming of individual chips.

Yasuhiro Take - One of the best experts on this subject based on the ideXlab platform.

  • analytical thruchip Inductive Coupling channel design optimization
    Asia and South Pacific Design Automation Conference, 2016
    Co-Authors: Lichung Hsu, Yasuhiro Take, Junichiro Kadomoto, So Hasegawa, Atsutake Kosuge, Tadahiro Kuroda
    Abstract:

    ThruChip interface (TCI) is an emerging 3-D integrated circuit stacking technology. TCI utilizes on-chip inductor to build vertical communication channel in near field distance and has been proved to stand comparison with through-siliconvia (TSV) in data rate, power, and reliability. Moreover, it is also cost-effective in manufacturing due to its wireless nature. In this paper, an analytical method is proposed to find near-optimal TCI Inductive Coupling channel solution. The experiment results show an average 16.8% transmitting current reduction and shrink design time from days to a few minutes.

  • efficient 3 d bus architectures for Inductive Coupling thruchip interfaces
    IEEE Transactions on Very Large Scale Integration Systems, 2016
    Co-Authors: Takahiro Kagami, Yasuhiro Take, Tadahiro Kuroda, Hiroki Matsutani, Michihiro Koibuchi, Hideharu Amano
    Abstract:

    Wireless 3-D network-on-chips (NoCs) with Inductive-Coupling ThruChip interfaces provide a large degree of flexibility for customizing the number of arbitrary chips in a package after chips have been fabricated. To simplify the vertical communication interfaces, static time division multiple access (TDMA) is used for the vertical broadcast buses, while arbitrary or customized topologies can be used for the intrachip network. This paper proposes two techniques to break through the simple static TDMA-based vertical buses while maintaining a simple communication interface. The first technique is headfirst sliding (HS) routing to reduce the waiting time for acquiring the communication time-slot. HS routing selects the best vertical bus based on the current time, taking advantage of static TDMA. The second technique extends carrier sense multiple access with collision detection (CSMA/CD) for vertical broadcast buses. We introduce a packet collision detection technique for Inductive-Coupling buses and propose two retransmission strategies to reduce the waiting time for packet retransmissions caused by collisions. Network simulation results show that HS routing reduces the communication latency by 39.1% compared with the conventional static TDMA bus-based 3-D NoC that uses the shortest path routing. The proposed CSMA/CD bus also improves the latency by 52.5% and throughput by 34.1%. The full-system simulation results show that HS routing and the proposed CSMA/CD technique reduce the application execution time accordingly while maintaining the average flit transfer energy overhead modest.

  • 3d noc with Inductive Coupling links for building block sips
    IEEE Transactions on Computers, 2014
    Co-Authors: Yasuhiro Take, Tadahiro Kuroda, Hiroki Matsutani, Daisuke Sasaki, Michihiro Koibuchi, Hideharu Amano
    Abstract:

    A wireless 3D NoC architecture is described for building-block SiPs, in which the number of hardware components (or chips) in a package can be changed after chips have been fabricated. The architecture uses Inductive-Coupling links that can connect more than two examined dies without wire connections. Each chip has data transceivers for the uplink and downlink in order to communicate with its neighboring chips in the package. These chips form a vertical unidirectional ring network so as to fully exploit the flexibility of the wireless approach that enables us to add, remove, and swap the chips in the ring. To avoid protocol and structural deadlocks in the ring, we use bubble flow control, which does not rely on the conventional VC-based deadlock avoidance mechanism. In addition, we propose a bidirectional communication scheme to form a bidirectional ring network by using the Inductive-Coupling transceivers that can dynamically change the communication modes, such as TX, RX, and Idle modes. This paper illustrates the Inductive-Coupling transceiver circuits, which can carry high data transfer rates of up to 8 Gbps per channel, for the wireless 3D NoC. It also illustrates an implementation of a wireless 3D NoC that has on-chip routers and transceivers implemented with a 65 nm process in order to show the feasibility of our proposal. The vertical bubble flow control and conventional VC-based approach on the uni- and bidirectional ring networks are compared with the vertical broadcast bus in terms of throughput, hardware amount, and application performance using a full system multiprocessor simulator. The results show that the proposed bidirectional communication scheme efficiently improves application performance without adding any Inductive-Coupling transceivers. In addition, the proposed vertical bubble flow network outperforms the conventional VC-based approach by 7.9-12.5 percent with a 33.5 percent smaller router area for building-block SiPs connecting up to eight chips.

  • a scalable 3d heterogeneous multi core processor with Inductive Coupling thruchip interface
    IEEE Hot Chips Symposium, 2013
    Co-Authors: Noriyuki Miura, Yasuhiro Take, Tadahiro Kuroda, Hiroki Matsutani, Hideharu Amano, Mitaro Namiki, Yusuke Koizumi, Eiichi Sasaki, Ryuichi Sakamoto, Kimiyoshi Usami
    Abstract:

    Recent battery driven IT devices including smart phone and tablets require versatile functions and high performance with low energy consumption. On the other hand, the initial cost of LSI for design and mask development has increased rapidly, and development of an SoC (System-on-a Chip) for each product has become difficult. Although flexible reconfigurable architectures can be a solution, the performance scalability is also necessary to cope with the wide performance range of products. As a solution, heterogeneous multi-core system using a 3-D wireless Inductive Coupling interconnect is proposed. This system consists of a MIPS-R3000 compatible embedded CPU and reconfigurable accelerators. Since chips are connected with wireless Inductive Coupling channels, the number and types of accelerators can be tailored easily depending on the requirement of the product.

  • a scalable 3d heterogeneous multi core processor with Inductive Coupling thruchip interface
    IEEE Micro, 2013
    Co-Authors: Noriyuki Miura, Yasuhiro Take, Tadahiro Kuroda, Hiroki Matsutani, Hideharu Amano, Mitaro Namiki, Yusuke Koizumi, Eiichi Sasaki, Ryuichi Sakamoto, Kimiyoshi Usami
    Abstract:

    A scalable heterogeneous multi-core processor is developed. 3D heterogeneous chip stacking of a general-purpose CPU and reconfigurable multi-core accelerators improves computational energy efficiency by proper task assignment and massive parallel computing. The stacked chips interconnect through a scalable 3D Network on Chip (NoC). By simply changing the number of stacked accelerator chips, processor parallelism can be widely scaled. In combination with Dynamic Voltage and Frequency Scaling (DVFS), the energy efficiency can be optimized for various performance requirements. No design change is needed, and hence no additional Non-Recurring Engineering (NRE) cost. An Inductive-Coupling ThruChip Interface (TCI) is applied to stacked-chip communications, forming a low-cost and robust high-speed 3D NoC. A prototype demonstration system has been developed with 65nm CMOS test chips. Successful system operations including 10-hours continuous Linux OS operation are confirmed for the first time.

Masato Motomura - One of the best experts on this subject based on the ideXlab platform.

  • a 96 mb 3d stacked sram using Inductive Coupling with 0 4 v transmitter termination scheme and 12 1 serdes in 40 nm cmos
    IEEE Transactions on Circuits and Systems I-regular Papers, 2021
    Co-Authors: Kota Shiba, Tatsuo Omori, Shinya Takamaedayamazaki, Kodai Ueyoshi, Masato Motomura, Mototsugu Hamada, Tadahiro Kuroda
    Abstract:

    A 28.8-GB/s 96-MB 3D-stacked SRAM is presented. A total of eight SRAM dies, designed in a 40-nm CMOS process, are vertically stacked and connected using an Inductive Coupling wireless link with a low-voltage NMOS push-pull transmitter that reduces the power of the link by 35% with a 0.4-V power supply. The SRAM utilizes an inverted bit insertion scheme that compensates for the degradation of the first transmitted bit, a coil termination scheme that aims to eliminate the ringing of 3D Inductive Coupling bus, and a 12:1 SerDes that minimizes power consumption and area overhead in Inductive Coupling channels. Low-power, large-capacity, 3-cycle latency 3D-stacked SRAM for a DNN accelerator is achieved with the combination of these techniques to serve as a replacement of 3D-stacked DRAM. The performance of the proposed 3D-SRAM is compared with HBM DRAM and achieves more than 50% lower energy consumption. The scaling scenario of the SRAM module is discussed in light of the scaling of the Inductive Coupling technology and logic process.

  • a 3d stacked sram using Inductive Coupling with low voltage transmitter and 12 1 serdes
    International Symposium on Circuits and Systems, 2020
    Co-Authors: Kota Shiba, Tatsuo Omori, Shinya Takamaedayamazaki, Kodai Ueyoshi, Kota Ando, Kazutoshi Hirose, Masato Motomura, Mototsugu Hamada, Tadahiro Kuroda
    Abstract:

    A 28.8-GB/s 96-MB 3D-stacked SRAM is presented. A total of eight SRAM dies, designed in a 40-nm CMOS process, are vertically stacked and connected using an Inductive Coupling wireless link with a low-voltage NMOS push-pull transmitter that reduces the power of the link by 45% with a 0.4-V power supply. The SRAM utilizes an inverted bit insertion scheme that compensates the degradation of the first signal, a coil termination scheme that aims to eliminate the noise of 3D Inductive Coupling bus, and a 12:1 SerDes. The data density of the SRAM should reach 12.3-MB/mm3, which extends beyond that of state-of-the-art stacked DRAMs.

  • QUEST: Multi-Purpose Log-Quantized DNN Inference Engine Stacked on 96-MB 3-D SRAM Using Inductive Coupling Technology in 40-nm CMOS
    IEEE Journal of Solid-State Circuits, 2019
    Co-Authors: Kodai Ueyoshi, Tadahiro Kuroda, Kota Ando, Kazutoshi Hirose, Mototsugu Hamada, Shinya Takamaeda-yamazaki, Masato Motomura
    Abstract:

    QUEST is a programmable multiple instruction, multiple data (MIMD) parallel accelerator for general-purpose state-of-the-art deep neural networks (DNNs). It features die-to-die stacking with three-cycle latency, 28.8 GB/s, 96 MB, and eight SRAMs using an Inductive Coupling technology called the ThruChip interface (TCI). By stacking the SRAMs instead of DRAMs, lower memory access latency and simpler hardware are expected. This facilitates in balancing the memory capacity, latency, and bandwidth, all of which are in demand by cutting-edge DNNs at a high level. QUEST also introduces log-quantized programmable bit-precision processing for achieving faster (larger) DNN computation (size) in a 3-D module. It can sustain higher recognition accuracy at a lower bitwidth region compared to linear quantization. The prototype QUEST chip is integrated in the 40-nm CMOS technology, and it achieves 7.49 tera operations per second (TOPS) peak performance in binary precision, and 1.96 TOPS in 4-bit precision at 300-MHz clock.