The Experts below are selected from a list of 588 Experts worldwide ranked by ideXlab platform
Radu Marculescu - One of the best experts on this subject based on the ideXlab platform.
-
DATE - SVR-NoC: a performance analysis tool for network-on-chips using learning-based support vector regression model
Design Automation & Test in Europe Conference & Exhibition (DATE) 2013, 2013Co-Authors: Zhiliang Qian, Da-cheng Juan, Paul Bogdan, Chi-ying Tsui, Diana Marculescu, Radu MarculescuAbstract:In this work, we propose SVR-NoC, a learning-based support vector regression (SVR) model for evaluating Network-on-Chip (NoC) Latency performance. Different from the state-of-the-art NoC analytical model, which uses classical queuing theory to directly compute the Average channel waiting time, the proposed SVR-NoC model performs NoC Latency analysis based on learning the typical training data. More specifically, we develop a systematic machine-learning framework that uses the kernel-based support vector regression method to predict the channel Average waiting time and the traffic flow Latency. Experimental results show that SVR-NoC can predict the Average Packet Latency accurately while achieving about 120X speed-up over simulation-based evaluation methods.
-
"It's a small world after all": NoC performance optimization via long-range link insertion
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2006Co-Authors: Umit Y. Ogras, Radu MarculescuAbstract:Networks-on-chip (NoCs) represent a promising solution to complex on-chip communication problems. The NoC communication architectures considered so far are based on either completely regular or fully customized topologies. In this paper, we present a methodology to automatically synthesize an archi- tecture which is neither regular nor fully customized. Instead, the communication architecture we propose is a superposition of a few long-range links and a standard mesh network. The few ap- plication-specific long-range links we insert significantly increase the critical traffic workload at which the network transitions from a free to a congested state. This way, we can exploit the benefits offered by both complete regularity and partial topology customization. Indeed, our experimental results demonstrate a significant reduction in the Average Packet Latency and a major improvement in the achievable network through with minimal impact on network topology.
-
ICCAD - Application-specific network-on-chip architecture customization via long-range link insertion
ICCAD-2005. IEEE ACM International Conference on Computer-Aided Design 2005., 2005Co-Authors: Umit Y. Ogras, Radu MarculescuAbstract:Networks-on-chip (NoCs) represent a promising solution to complex on-chip communication problems. The NoC communication architectures considered so far are based on either completely regular or fully customized topologies. In this paper, we present a methodology to automatically synthesize an architecture where a few application-specific long-range links are inserted on top of a regular mesh network. This way, we can better exploit the benefits of both complete regularity and partial customization. Indeed, our experimental results show that inserting application-specific long-range links significantly increases the critical traffic workload at which the network state transits from a free to a congested regime. This, in turn, results in a significant reduction in the Average Packet Latency and a major improvement in the network achievable throughput.
-
Application-specific network-on-chip architecture customization via long-range link insertion
IEEE ACM International Conference on Computer-Aided Design Digest of Technical Papers ICCAD, 2005Co-Authors: Umit Y. Ogras, Radu MarculescuAbstract:Networks-on-chip (NoCs) represent a promising solution to complex on-chip communication problems. The NoC communication architectures considered so far are based on either completely regular or fully customized topologies. In this paper, we present a methodology to automatically synthesize an architecture where a few application-specific long-range links are inserted on top of a regular mesh network. This way, we can better exploit the benefits of both complete regularity and partial customization. Indeed, our experimental results show that inserting application-specific long-range links significantly increases the critical traffic workload at which the network state transits from a free to a congested regime. This, in turn, results in a significant reduction in the Average Packet Latency and a major improvement in the network achievable throughput.
Umit Y. Ogras - One of the best experts on this subject based on the ideXlab platform.
-
"It's a small world after all": NoC performance optimization via long-range link insertion
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2006Co-Authors: Umit Y. Ogras, Radu MarculescuAbstract:Networks-on-chip (NoCs) represent a promising solution to complex on-chip communication problems. The NoC communication architectures considered so far are based on either completely regular or fully customized topologies. In this paper, we present a methodology to automatically synthesize an archi- tecture which is neither regular nor fully customized. Instead, the communication architecture we propose is a superposition of a few long-range links and a standard mesh network. The few ap- plication-specific long-range links we insert significantly increase the critical traffic workload at which the network transitions from a free to a congested state. This way, we can exploit the benefits offered by both complete regularity and partial topology customization. Indeed, our experimental results demonstrate a significant reduction in the Average Packet Latency and a major improvement in the achievable network through with minimal impact on network topology.
-
ICCAD - Application-specific network-on-chip architecture customization via long-range link insertion
ICCAD-2005. IEEE ACM International Conference on Computer-Aided Design 2005., 2005Co-Authors: Umit Y. Ogras, Radu MarculescuAbstract:Networks-on-chip (NoCs) represent a promising solution to complex on-chip communication problems. The NoC communication architectures considered so far are based on either completely regular or fully customized topologies. In this paper, we present a methodology to automatically synthesize an architecture where a few application-specific long-range links are inserted on top of a regular mesh network. This way, we can better exploit the benefits of both complete regularity and partial customization. Indeed, our experimental results show that inserting application-specific long-range links significantly increases the critical traffic workload at which the network state transits from a free to a congested regime. This, in turn, results in a significant reduction in the Average Packet Latency and a major improvement in the network achievable throughput.
-
Application-specific network-on-chip architecture customization via long-range link insertion
IEEE ACM International Conference on Computer-Aided Design Digest of Technical Papers ICCAD, 2005Co-Authors: Umit Y. Ogras, Radu MarculescuAbstract:Networks-on-chip (NoCs) represent a promising solution to complex on-chip communication problems. The NoC communication architectures considered so far are based on either completely regular or fully customized topologies. In this paper, we present a methodology to automatically synthesize an architecture where a few application-specific long-range links are inserted on top of a regular mesh network. This way, we can better exploit the benefits of both complete regularity and partial customization. Indeed, our experimental results show that inserting application-specific long-range links significantly increases the critical traffic workload at which the network state transits from a free to a congested regime. This, in turn, results in a significant reduction in the Average Packet Latency and a major improvement in the network achievable throughput.
Axel Jantsch - One of the best experts on this subject based on the ideXlab platform.
-
ASICON - Performance analysis of on-chip bufferless router with multi-ejection ports
2015 IEEE 11th International Conference on ASIC (ASICON), 2015Co-Authors: Chaochao Feng, Axel Jantsch, Zhonghai Lu, Zhuofan Liao, Zhenyu ZhaoAbstract:In general, the bufferless NoC router has only one local output port for ejection, which may lead to multiple arriving flits competing for the only one output port. In this paper, we propose a reconfigurable bufferless router in which the number of ejection ports can be configured as 2, 3 and 4. Simulation results demonstrate that the Average Packet Latency of the routers with multi-ejection ports is 18%, 10%, 6%, 14%, 9% and 7% on Average less than that of the router with 1 ejection ports under six synthetic workloads respectively. For application workloads, the Average Packet Latency of the router with more than two ejection ports is slightly better than the router with only one ejection port, which can be neglect. Making a compromise of hardware cost and performance, it can be concluded that it is no need to implement bufferless routers with 3 and 4 ejection ports, as the router with 2 ejection ports can achieve almost the same performance as the routers with 3 and 4 ejection ports.
-
A Heuristic Framework for Designing and Exploring Deterministic Routing Algorithm for NoCs
Routing Algorithms in Networks-on-Chip, 2013Co-Authors: Abbas Eslami Kiasari, Axel Jantsch, Zhonghai LuAbstract:In this chapter, we present a system-level framework for designing minimal deterministic routing algorithms for Networks-on-Chip (NoCs) that are customized for a set of applications. To this end, we first formulate an optimization problem of minimizing Average Packet Latency in the network and then use the simulated annealing heuristic to solve this problem. To estimate the Average Packet Latency we use a queueing-based analytical model which can capture the burstiness of the traffic. The proposed framework does not require virtual channels to guarantee deadlock freedom since routes are extracted from an acyclic channel dependency graph. Experiments with both synthetic and realistic workloads show the effectiveness of the approach. Results show that maximum sustainable throughput of the network is improved for different applications and architectures.
-
An Analytical Latency Model for Networks-on-Chip
IEEE Transactions on Very Large Scale Integration Systems, 2013Co-Authors: Abbas Eslami Kiasari, Zhonghai Lu, Axel JantschAbstract:We propose an analytical model based on queueing theory for delay analysis in a wormhole-switched network-on-chip (NoC). The proposed model takes as input an application communication graph, a topology graph, a mapping vector, and a routing matrix, and estimates Average Packet Latency and router blocking time. It works for arbitrary network topology with deterministic routing under arbitrary traffic patterns. This model can estimate per-flow Average Latency accurately and quickly, thus enabling fast design space exploration of various design parameters in NoC designs. Experimental results show that the proposed analytical model can predict the Average Packet Latency more than four orders of magnitude faster than an accurate simulation, while the computation error is less than 10% in non-saturated networks for different system-on-chip platforms.
-
A 1-Cycle 1.25GHz Bufferless Router for 3D Network-on-Chip
IEICE Transactions on Information and Systems, 2012Co-Authors: Chaochao Feng, Axel Jantsch, Zhonghai Lu, Minxuan ZhangAbstract:In this paper, we propose a 1-cycle high-performance 3D bufferless router with a 3-stage permutation network. The proposed router utilizes the 3-stage permutation network instead of the serialized switch allocator and 7 x 7 crossbar to achieve the frequency of 1.25 GHz in TSMC 65 nm technology. Compared with the other two 3D bufferless routers, the proposed router occupies less area and consumes less power consumption. Simulation results under both synthetic and application workloads illustrate that the proposed router achieves less Average Packet Latency than the other two 3D bufferless routers.
-
ISVLSI - A Low-Overhead Fault-Aware Deflection Routing Algorithm for 3D Network-on-Chip
2011 IEEE Computer Society Annual Symposium on VLSI, 2011Co-Authors: Chaochao Feng, Zhonghai Lu, Minxuan Zhang, Jinwen Li, Jiang Jiang, Axel JantschAbstract:This paper proposes a low-overhead fault-tolerant deflection routing algorithm, which uses a layer routing table and two TSV state vectors to make efficient routing decision to avoid both TSV and horizontal link faults, for 3D NoC. The proposed switch is implemented in hardware with TSMC 65nm technology, which can achieve 250MHz. Compared with a reinforcement-learning-based fault-tolerant deflection switch with a global routing table, the proposed switch occupies 40% less area and consumes 49% less power consumption. Simulation results demonstrate that the proposed switch has 5% less Average Packet Latency than the switch with the global routing table under real application workloads and with only 5% performance degradation under synthetic workloads in the presence of 10% link faults.
Michael Opoku Agyeman - One of the best experts on this subject based on the ideXlab platform.
-
SBAC-PAD Workshops - An Efficient 2D Router Architecture for Extending the Performance of Inhomogeneous 3D NoC-Based Multi-Core Architectures
2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), 2016Co-Authors: Michael Opoku Agyeman, Wen ZongAbstract:To meet the performance and scalability demands of the fast-paced technological growth towards exascale and Big-Data processing with the performance bottleneck of conventional metal based interconnects, alternative interconnect fabrics such as inhomogeneous three dimensional integrated Network-on-Chip (3D NoC) has emanated as a cost-effective solution for emerging multi-core design. However, these interconnects trade-off optimized performance for cost by restricting the number of area and power hungry 3D routers. Consequently, in this paper, we propose a low-Latency adaptive router with a low-complexity single-cycle bypassing mechanism to alleviate the performance degradation due to the slow 2D routers in inhomogeneous 3D NoCs. By combining the low-complexity bypassing technique with adaptive routing, the proposed router is able to balance the traffic in the network to reduce the Average Packet Latency under various traffic loads. Simulation shows that, the proposed router can reduce the Average Packet delay by an Average of 45% in 3D NoCs.
-
HETEROGENEOUS 3D NETWORK-ON-CHIP ARCHITECTURES: AREA AND POWER AWARE DESIGN TECHNIQUES
Journal of Circuits Systems and Computers, 2013Co-Authors: Michael Opoku Agyeman, Ali Ahmadinia, Alireza ShahrabiAbstract:Three-dimensional Network-on-Chip (3D NoC) architectures have gained a lot of popularity to solve the on-chip communication delays of next generation System-on-Chip (SoC) systems. However, the vertical interconnects of 3D NoC are expensive and complex to manufacture. Also, 3D router architecture consumes more power and occupies more area per chip floorplan compared to a 2D router. Hence, more efficient architectures should be designed. In this paper, we propose area efficient and low power 3D heterogeneous NoC architectures, which combines both the power and performance benefits of 2D routers and 3D NoC-bus hybrid router architectures in 3D NoC architectures. Experimental results show a negligible penalty (less than 5%) in Average Packet Latency of the proposed heterogeneous 3D NoC architectures compared to typical homogeneous 3D NoCs, while the heterogeneity provides power and area efficiency of up to 61% and 19.7%, respectively.
-
Power and area optimisation in heterogeneous 3D networks-on-chip architectures
ACM Sigarch Computer Architecture News, 2011Co-Authors: Michael Opoku Agyeman, Ali AhmadiniaAbstract:Three dimensional Network-on-Chip (3D NoC) architectures have evolved with a lot of interest to address the on-chip communication delays of modern SoC systems. However, the vertical interconnections between layers is more power and area hungry compared to 2D interconnections. In this paper we propose area efficient and low power heterogeneous NoC architectures, which combines both the power and performance benefits of 2D routers and 3D NoC-bus hybrid router architectures in 3D mesh topologies. Experimental results show a negligible penalty of up to 5% in Average Packet Latency of 3D homogeneous NoC with bus hybrid routers. The heterogeneity however provides superiority of up to 67% and 19.7% in power and area efficiency of the NoC resources, respectively.
-
HPCS - Low power heterogeneous 3D Networks-on-Chip architectures
2011 International Conference on High Performance Computing & Simulation, 2011Co-Authors: Michael Opoku Agyeman, Ali Ahmadinia, Alireza ShahrabiAbstract:Three dimensional Network-on-Chip (3D NoC) architectures have evolved with a lot of interest to address the on-chip communication delays of modern SoC systems. In this paper we propose low power heterogeneous NoC architectures, which combines both the power and performance benefits of 2D routers and 3D NoC-bus hybrid router architectures in 3D mesh topologies. Experimental results show a negligible penalty of up to 5% in Average Packet Latency of 3D mesh with homogeneous distribution of 3D NoC-bus hybrid routers. The heterogeneity however provides superiority of up to 67% and 19.7% in total crossbar area and power efficiency of the NoC resources, respectively compared to that of 3D mesh with homogeneous distribution of 3D NoC-bus hybrid routers.
Natalie Enright Jerger - One of the best experts on this subject based on the ideXlab platform.
-
ISLPED - Muffin: Minimally-Buffered Zero-Delay Power-Gating Technique in On-Chip Routers
2019 IEEE ACM International Symposium on Low Power Electronics and Design (ISLPED), 2019Co-Authors: Hossein Farrokhbakht, Hadi Mardani Kamali, Natalie Enright JergerAbstract:Although conventional Network-on-Chip (NoC) designs provide high bandwidth, many modern applications for many-core architectures have significant periods of low NoC utilization. Highly provisioned NoCs provide the required performance during periods of high activity; yet, large NoC designs come with high power costs. Furthermore, as technology shrinks, the contribution of static power increases. Hence, numerous NoC power-gating techniques have been proposed to alleviate the growing contribution of static power. However, the efficiency of power-gating techniques decreases due to sporadic Packet arrivals across a range of injection rates. In this paper, we propose Minimally-Buffered Router Infrastructure (Muffin), which increases the number of traversals that can be made without needing to power on the routers. Empirical results on SPLASH-2 show that, compared to conventional power-gating scheme, Muffin improves static power consumption by an Average of 95.4%, while improving the Average Packet Latency by 73.7%.
-
ISLPED - SPONGE: A Scalable Pivot-based On/Off Gating Engine for Reducing Static Power in NoC Routers
Proceedings of the International Symposium on Low Power Electronics and Design - ISLPED '18, 2018Co-Authors: Hossein Farrokhbakht, Natalie Enright Jerger, Hadi Mardani Kamali, Shaahin HessabiAbstract:Due to high aggregate idle time of Networks-on-Chip (NoCs) routers in practical applications, power-gating techniques have been proposed to combat the ever-increasing ratio of static power. Nevertheless, the sporadic Packet arrivals compromise the effectiveness of power-gating by incurring significant Latency and energy overhead. In this paper, we propose a Scalable Pivot-based On/Off Gating Engine (SPONGE) which efficiently manages power-gating decisions and routing mechanism by adaptively selecting a small set of powered-on columns of routers and keeping the others in power-gated state. To this end, a router architecture augmented with a novel routing algorithm is proposed in which a Packet can traverse powered-off routers without waking them up, and can only turn in predetermined powered-on routers. Experimental results on SPLASH-2 benchmarks demonstrate that, compared to the conventional power-gating method, SPONGE on Average not only improves static power consumption by 81.7%, it also improves Average Packet Latency by 63%.
-
Moths: Mobile threads for on-chip networks
ACM Transactions in Embedded Computing Systems, 2013Co-Authors: Matthew Misler, Natalie Enright JergerAbstract:As the number of cores integrated on a single chip continues to increase, communication has the potential to become a severe bottleneck to overall system performance. The presence of thread sharing and the distribution of data across cache banks on the chip can result in longdistance communication. Long-distance communication incurs substantial Latency that impacts performance; furthermore, this communication consumes significant dynamic power when Packets are switched over many Network-on-Chip (NoC) links and routers. Thread migration can mitigate problems created by long distance communication. This article presents Moths, an efficient runtime algorithm that responds automatically to dynamic NoC traffic patterns, providing beneficial thread migration to decrease overall traffic volume and Average Packet Latency. Moths reduces on-chip network Latency by up to 28.4p (18.0p on Average) and traffic volume by up to 24.9p (20.6p on Average) across a variety of commercial and scientific benchmarks.
-
PACT - Moths: mobile threads for on-chip networks
Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT '10, 2010Co-Authors: Matthew Misler, Natalie Enright JergerAbstract:As the number of cores integrated on a single chip continues to increase, communication has the potential to become a severe bottleneck to overall system performance. The presence of thread sharing and the distribution of data across cache banks on the chip can result in long distance communication. Long distance communication incurs substantial Latency that impacts performance; furthermore, this communication consumes significant dynamic power when Packets are switched over many Network-on-Chip (NoC) links and routers. Thread migration can mitigate problems created by long distance communication. We present Moths, an efficient run-time algorithm that responds automatically to dynamic NoC traffic patterns, providing beneficial thread migration to decrease overall traffic volume and Average Packet Latency. Moths reduces on-chip network Latency by up to 28.4% (18.0% on Average) and traffic volume by up to 24.9% (20.6% on Average) across a variety of commercial and scientific benchmarks.
-
MICRO - SCARAB: a single cycle adaptive routing and bufferless network
Proceedings of the 42nd Annual IEEE ACM International Symposium on Microarchitecture - Micro-42, 2009Co-Authors: Mitchell Hayenga, Natalie Enright Jerger, Mikko H. LipastiAbstract:As technology scaling drives the number of processor cores upward, current on-chip routers consume substantial portions of chip area and power budgets. Since existing research has greatly reduced router Latency overheads and capitalized on available on-chip bandwidth, power constraints dominate interconnection network design. Recently research has proposed bufferless routers as a means to alleviate these constraints, but to date all designs exhibit poor operational frequency, throughput, or Latency. In this paper, we propose an efficient bufferless router which lowers Average Packet Latency by 17.6% and dynamic energy by 18.3% over existing bufferless on-chip network designs. In order to maintain the energy and area benefit of bufferless routers while delivering ultra-low latencies, our router utilizes an opportunistic processor-side buffering technique and an energy-efficient circuit-switched network for delivering negative acknowledgments for dropped Packets.