Average Packet Latency

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 588 Experts worldwide ranked by ideXlab platform

Radu Marculescu - One of the best experts on this subject based on the ideXlab platform.

  • DATE - SVR-NoC: a performance analysis tool for network-on-chips using learning-based support vector regression model
    Design Automation & Test in Europe Conference & Exhibition (DATE) 2013, 2013
    Co-Authors: Zhiliang Qian, Da-cheng Juan, Paul Bogdan, Chi-ying Tsui, Diana Marculescu, Radu Marculescu
    Abstract:

    In this work, we propose SVR-NoC, a learning-based support vector regression (SVR) model for evaluating Network-on-Chip (NoC) Latency performance. Different from the state-of-the-art NoC analytical model, which uses classical queuing theory to directly compute the Average channel waiting time, the proposed SVR-NoC model performs NoC Latency analysis based on learning the typical training data. More specifically, we develop a systematic machine-learning framework that uses the kernel-based support vector regression method to predict the channel Average waiting time and the traffic flow Latency. Experimental results show that SVR-NoC can predict the Average Packet Latency accurately while achieving about 120X speed-up over simulation-based evaluation methods.

  • "It's a small world after all": NoC performance optimization via long-range link insertion
    IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2006
    Co-Authors: Umit Y. Ogras, Radu Marculescu
    Abstract:

    Networks-on-chip (NoCs) represent a promising solution to complex on-chip communication problems. The NoC communication architectures considered so far are based on either completely regular or fully customized topologies. In this paper, we present a methodology to automatically synthesize an archi- tecture which is neither regular nor fully customized. Instead, the communication architecture we propose is a superposition of a few long-range links and a standard mesh network. The few ap- plication-specific long-range links we insert significantly increase the critical traffic workload at which the network transitions from a free to a congested state. This way, we can exploit the benefits offered by both complete regularity and partial topology customization. Indeed, our experimental results demonstrate a significant reduction in the Average Packet Latency and a major improvement in the achievable network through with minimal impact on network topology.

  • ICCAD - Application-specific network-on-chip architecture customization via long-range link insertion
    ICCAD-2005. IEEE ACM International Conference on Computer-Aided Design 2005., 2005
    Co-Authors: Umit Y. Ogras, Radu Marculescu
    Abstract:

    Networks-on-chip (NoCs) represent a promising solution to complex on-chip communication problems. The NoC communication architectures considered so far are based on either completely regular or fully customized topologies. In this paper, we present a methodology to automatically synthesize an architecture where a few application-specific long-range links are inserted on top of a regular mesh network. This way, we can better exploit the benefits of both complete regularity and partial customization. Indeed, our experimental results show that inserting application-specific long-range links significantly increases the critical traffic workload at which the network state transits from a free to a congested regime. This, in turn, results in a significant reduction in the Average Packet Latency and a major improvement in the network achievable throughput.

  • Application-specific network-on-chip architecture customization via long-range link insertion
    IEEE ACM International Conference on Computer-Aided Design Digest of Technical Papers ICCAD, 2005
    Co-Authors: Umit Y. Ogras, Radu Marculescu
    Abstract:

    Networks-on-chip (NoCs) represent a promising solution to complex on-chip communication problems. The NoC communication architectures considered so far are based on either completely regular or fully customized topologies. In this paper, we present a methodology to automatically synthesize an architecture where a few application-specific long-range links are inserted on top of a regular mesh network. This way, we can better exploit the benefits of both complete regularity and partial customization. Indeed, our experimental results show that inserting application-specific long-range links significantly increases the critical traffic workload at which the network state transits from a free to a congested regime. This, in turn, results in a significant reduction in the Average Packet Latency and a major improvement in the network achievable throughput.

Umit Y. Ogras - One of the best experts on this subject based on the ideXlab platform.

  • "It's a small world after all": NoC performance optimization via long-range link insertion
    IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2006
    Co-Authors: Umit Y. Ogras, Radu Marculescu
    Abstract:

    Networks-on-chip (NoCs) represent a promising solution to complex on-chip communication problems. The NoC communication architectures considered so far are based on either completely regular or fully customized topologies. In this paper, we present a methodology to automatically synthesize an archi- tecture which is neither regular nor fully customized. Instead, the communication architecture we propose is a superposition of a few long-range links and a standard mesh network. The few ap- plication-specific long-range links we insert significantly increase the critical traffic workload at which the network transitions from a free to a congested state. This way, we can exploit the benefits offered by both complete regularity and partial topology customization. Indeed, our experimental results demonstrate a significant reduction in the Average Packet Latency and a major improvement in the achievable network through with minimal impact on network topology.

  • ICCAD - Application-specific network-on-chip architecture customization via long-range link insertion
    ICCAD-2005. IEEE ACM International Conference on Computer-Aided Design 2005., 2005
    Co-Authors: Umit Y. Ogras, Radu Marculescu
    Abstract:

    Networks-on-chip (NoCs) represent a promising solution to complex on-chip communication problems. The NoC communication architectures considered so far are based on either completely regular or fully customized topologies. In this paper, we present a methodology to automatically synthesize an architecture where a few application-specific long-range links are inserted on top of a regular mesh network. This way, we can better exploit the benefits of both complete regularity and partial customization. Indeed, our experimental results show that inserting application-specific long-range links significantly increases the critical traffic workload at which the network state transits from a free to a congested regime. This, in turn, results in a significant reduction in the Average Packet Latency and a major improvement in the network achievable throughput.

  • Application-specific network-on-chip architecture customization via long-range link insertion
    IEEE ACM International Conference on Computer-Aided Design Digest of Technical Papers ICCAD, 2005
    Co-Authors: Umit Y. Ogras, Radu Marculescu
    Abstract:

    Networks-on-chip (NoCs) represent a promising solution to complex on-chip communication problems. The NoC communication architectures considered so far are based on either completely regular or fully customized topologies. In this paper, we present a methodology to automatically synthesize an architecture where a few application-specific long-range links are inserted on top of a regular mesh network. This way, we can better exploit the benefits of both complete regularity and partial customization. Indeed, our experimental results show that inserting application-specific long-range links significantly increases the critical traffic workload at which the network state transits from a free to a congested regime. This, in turn, results in a significant reduction in the Average Packet Latency and a major improvement in the network achievable throughput.

Axel Jantsch - One of the best experts on this subject based on the ideXlab platform.

  • ASICON - Performance analysis of on-chip bufferless router with multi-ejection ports
    2015 IEEE 11th International Conference on ASIC (ASICON), 2015
    Co-Authors: Chaochao Feng, Axel Jantsch, Zhonghai Lu, Zhuofan Liao, Zhenyu Zhao
    Abstract:

    In general, the bufferless NoC router has only one local output port for ejection, which may lead to multiple arriving flits competing for the only one output port. In this paper, we propose a reconfigurable bufferless router in which the number of ejection ports can be configured as 2, 3 and 4. Simulation results demonstrate that the Average Packet Latency of the routers with multi-ejection ports is 18%, 10%, 6%, 14%, 9% and 7% on Average less than that of the router with 1 ejection ports under six synthetic workloads respectively. For application workloads, the Average Packet Latency of the router with more than two ejection ports is slightly better than the router with only one ejection port, which can be neglect. Making a compromise of hardware cost and performance, it can be concluded that it is no need to implement bufferless routers with 3 and 4 ejection ports, as the router with 2 ejection ports can achieve almost the same performance as the routers with 3 and 4 ejection ports.

  • A Heuristic Framework for Designing and Exploring Deterministic Routing Algorithm for NoCs
    Routing Algorithms in Networks-on-Chip, 2013
    Co-Authors: Abbas Eslami Kiasari, Axel Jantsch, Zhonghai Lu
    Abstract:

    In this chapter, we present a system-level framework for designing minimal deterministic routing algorithms for Networks-on-Chip (NoCs) that are customized for a set of applications. To this end, we first formulate an optimization problem of minimizing Average Packet Latency in the network and then use the simulated annealing heuristic to solve this problem. To estimate the Average Packet Latency we use a queueing-based analytical model which can capture the burstiness of the traffic. The proposed framework does not require virtual channels to guarantee deadlock freedom since routes are extracted from an acyclic channel dependency graph. Experiments with both synthetic and realistic workloads show the effectiveness of the approach. Results show that maximum sustainable throughput of the network is improved for different applications and architectures.

  • An Analytical Latency Model for Networks-on-Chip
    IEEE Transactions on Very Large Scale Integration Systems, 2013
    Co-Authors: Abbas Eslami Kiasari, Zhonghai Lu, Axel Jantsch
    Abstract:

    We propose an analytical model based on queueing theory for delay analysis in a wormhole-switched network-on-chip (NoC). The proposed model takes as input an application communication graph, a topology graph, a mapping vector, and a routing matrix, and estimates Average Packet Latency and router blocking time. It works for arbitrary network topology with deterministic routing under arbitrary traffic patterns. This model can estimate per-flow Average Latency accurately and quickly, thus enabling fast design space exploration of various design parameters in NoC designs. Experimental results show that the proposed analytical model can predict the Average Packet Latency more than four orders of magnitude faster than an accurate simulation, while the computation error is less than 10% in non-saturated networks for different system-on-chip platforms.

  • A 1-Cycle 1.25GHz Bufferless Router for 3D Network-on-Chip
    IEICE Transactions on Information and Systems, 2012
    Co-Authors: Chaochao Feng, Axel Jantsch, Zhonghai Lu, Minxuan Zhang
    Abstract:

    In this paper, we propose a 1-cycle high-performance 3D bufferless router with a 3-stage permutation network. The proposed router utilizes the 3-stage permutation network instead of the serialized switch allocator and 7 x 7 crossbar to achieve the frequency of 1.25 GHz in TSMC 65 nm technology. Compared with the other two 3D bufferless routers, the proposed router occupies less area and consumes less power consumption. Simulation results under both synthetic and application workloads illustrate that the proposed router achieves less Average Packet Latency than the other two 3D bufferless routers.

  • ISVLSI - A Low-Overhead Fault-Aware Deflection Routing Algorithm for 3D Network-on-Chip
    2011 IEEE Computer Society Annual Symposium on VLSI, 2011
    Co-Authors: Chaochao Feng, Zhonghai Lu, Minxuan Zhang, Jinwen Li, Jiang Jiang, Axel Jantsch
    Abstract:

    This paper proposes a low-overhead fault-tolerant deflection routing algorithm, which uses a layer routing table and two TSV state vectors to make efficient routing decision to avoid both TSV and horizontal link faults, for 3D NoC. The proposed switch is implemented in hardware with TSMC 65nm technology, which can achieve 250MHz. Compared with a reinforcement-learning-based fault-tolerant deflection switch with a global routing table, the proposed switch occupies 40% less area and consumes 49% less power consumption. Simulation results demonstrate that the proposed switch has 5% less Average Packet Latency than the switch with the global routing table under real application workloads and with only 5% performance degradation under synthetic workloads in the presence of 10% link faults.

Michael Opoku Agyeman - One of the best experts on this subject based on the ideXlab platform.

  • SBAC-PAD Workshops - An Efficient 2D Router Architecture for Extending the Performance of Inhomogeneous 3D NoC-Based Multi-Core Architectures
    2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), 2016
    Co-Authors: Michael Opoku Agyeman, Wen Zong
    Abstract:

    To meet the performance and scalability demands of the fast-paced technological growth towards exascale and Big-Data processing with the performance bottleneck of conventional metal based interconnects, alternative interconnect fabrics such as inhomogeneous three dimensional integrated Network-on-Chip (3D NoC) has emanated as a cost-effective solution for emerging multi-core design. However, these interconnects trade-off optimized performance for cost by restricting the number of area and power hungry 3D routers. Consequently, in this paper, we propose a low-Latency adaptive router with a low-complexity single-cycle bypassing mechanism to alleviate the performance degradation due to the slow 2D routers in inhomogeneous 3D NoCs. By combining the low-complexity bypassing technique with adaptive routing, the proposed router is able to balance the traffic in the network to reduce the Average Packet Latency under various traffic loads. Simulation shows that, the proposed router can reduce the Average Packet delay by an Average of 45% in 3D NoCs.

  • HETEROGENEOUS 3D NETWORK-ON-CHIP ARCHITECTURES: AREA AND POWER AWARE DESIGN TECHNIQUES
    Journal of Circuits Systems and Computers, 2013
    Co-Authors: Michael Opoku Agyeman, Ali Ahmadinia, Alireza Shahrabi
    Abstract:

    Three-dimensional Network-on-Chip (3D NoC) architectures have gained a lot of popularity to solve the on-chip communication delays of next generation System-on-Chip (SoC) systems. However, the vertical interconnects of 3D NoC are expensive and complex to manufacture. Also, 3D router architecture consumes more power and occupies more area per chip floorplan compared to a 2D router. Hence, more efficient architectures should be designed. In this paper, we propose area efficient and low power 3D heterogeneous NoC architectures, which combines both the power and performance benefits of 2D routers and 3D NoC-bus hybrid router architectures in 3D NoC architectures. Experimental results show a negligible penalty (less than 5%) in Average Packet Latency of the proposed heterogeneous 3D NoC architectures compared to typical homogeneous 3D NoCs, while the heterogeneity provides power and area efficiency of up to 61% and 19.7%, respectively.

  • Power and area optimisation in heterogeneous 3D networks-on-chip architectures
    ACM Sigarch Computer Architecture News, 2011
    Co-Authors: Michael Opoku Agyeman, Ali Ahmadinia
    Abstract:

    Three dimensional Network-on-Chip (3D NoC) architectures have evolved with a lot of interest to address the on-chip communication delays of modern SoC systems. However, the vertical interconnections between layers is more power and area hungry compared to 2D interconnections. In this paper we propose area efficient and low power heterogeneous NoC architectures, which combines both the power and performance benefits of 2D routers and 3D NoC-bus hybrid router architectures in 3D mesh topologies. Experimental results show a negligible penalty of up to 5% in Average Packet Latency of 3D homogeneous NoC with bus hybrid routers. The heterogeneity however provides superiority of up to 67% and 19.7% in power and area efficiency of the NoC resources, respectively.

  • HPCS - Low power heterogeneous 3D Networks-on-Chip architectures
    2011 International Conference on High Performance Computing & Simulation, 2011
    Co-Authors: Michael Opoku Agyeman, Ali Ahmadinia, Alireza Shahrabi
    Abstract:

    Three dimensional Network-on-Chip (3D NoC) architectures have evolved with a lot of interest to address the on-chip communication delays of modern SoC systems. In this paper we propose low power heterogeneous NoC architectures, which combines both the power and performance benefits of 2D routers and 3D NoC-bus hybrid router architectures in 3D mesh topologies. Experimental results show a negligible penalty of up to 5% in Average Packet Latency of 3D mesh with homogeneous distribution of 3D NoC-bus hybrid routers. The heterogeneity however provides superiority of up to 67% and 19.7% in total crossbar area and power efficiency of the NoC resources, respectively compared to that of 3D mesh with homogeneous distribution of 3D NoC-bus hybrid routers.

Natalie Enright Jerger - One of the best experts on this subject based on the ideXlab platform.

  • ISLPED - Muffin: Minimally-Buffered Zero-Delay Power-Gating Technique in On-Chip Routers
    2019 IEEE ACM International Symposium on Low Power Electronics and Design (ISLPED), 2019
    Co-Authors: Hossein Farrokhbakht, Hadi Mardani Kamali, Natalie Enright Jerger
    Abstract:

    Although conventional Network-on-Chip (NoC) designs provide high bandwidth, many modern applications for many-core architectures have significant periods of low NoC utilization. Highly provisioned NoCs provide the required performance during periods of high activity; yet, large NoC designs come with high power costs. Furthermore, as technology shrinks, the contribution of static power increases. Hence, numerous NoC power-gating techniques have been proposed to alleviate the growing contribution of static power. However, the efficiency of power-gating techniques decreases due to sporadic Packet arrivals across a range of injection rates. In this paper, we propose Minimally-Buffered Router Infrastructure (Muffin), which increases the number of traversals that can be made without needing to power on the routers. Empirical results on SPLASH-2 show that, compared to conventional power-gating scheme, Muffin improves static power consumption by an Average of 95.4%, while improving the Average Packet Latency by 73.7%.

  • ISLPED - SPONGE: A Scalable Pivot-based On/Off Gating Engine for Reducing Static Power in NoC Routers
    Proceedings of the International Symposium on Low Power Electronics and Design - ISLPED '18, 2018
    Co-Authors: Hossein Farrokhbakht, Natalie Enright Jerger, Hadi Mardani Kamali, Shaahin Hessabi
    Abstract:

    Due to high aggregate idle time of Networks-on-Chip (NoCs) routers in practical applications, power-gating techniques have been proposed to combat the ever-increasing ratio of static power. Nevertheless, the sporadic Packet arrivals compromise the effectiveness of power-gating by incurring significant Latency and energy overhead. In this paper, we propose a Scalable Pivot-based On/Off Gating Engine (SPONGE) which efficiently manages power-gating decisions and routing mechanism by adaptively selecting a small set of powered-on columns of routers and keeping the others in power-gated state. To this end, a router architecture augmented with a novel routing algorithm is proposed in which a Packet can traverse powered-off routers without waking them up, and can only turn in predetermined powered-on routers. Experimental results on SPLASH-2 benchmarks demonstrate that, compared to the conventional power-gating method, SPONGE on Average not only improves static power consumption by 81.7%, it also improves Average Packet Latency by 63%.

  • Moths: Mobile threads for on-chip networks
    ACM Transactions in Embedded Computing Systems, 2013
    Co-Authors: Matthew Misler, Natalie Enright Jerger
    Abstract:

    As the number of cores integrated on a single chip continues to increase, communication has the potential to become a severe bottleneck to overall system performance. The presence of thread sharing and the distribution of data across cache banks on the chip can result in longdistance communication. Long-distance communication incurs substantial Latency that impacts performance; furthermore, this communication consumes significant dynamic power when Packets are switched over many Network-on-Chip (NoC) links and routers. Thread migration can mitigate problems created by long distance communication. This article presents Moths, an efficient runtime algorithm that responds automatically to dynamic NoC traffic patterns, providing beneficial thread migration to decrease overall traffic volume and Average Packet Latency. Moths reduces on-chip network Latency by up to 28.4p (18.0p on Average) and traffic volume by up to 24.9p (20.6p on Average) across a variety of commercial and scientific benchmarks.

  • PACT - Moths: mobile threads for on-chip networks
    Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT '10, 2010
    Co-Authors: Matthew Misler, Natalie Enright Jerger
    Abstract:

    As the number of cores integrated on a single chip continues to increase, communication has the potential to become a severe bottleneck to overall system performance. The presence of thread sharing and the distribution of data across cache banks on the chip can result in long distance communication. Long distance communication incurs substantial Latency that impacts performance; furthermore, this communication consumes significant dynamic power when Packets are switched over many Network-on-Chip (NoC) links and routers. Thread migration can mitigate problems created by long distance communication. We present Moths, an efficient run-time algorithm that responds automatically to dynamic NoC traffic patterns, providing beneficial thread migration to decrease overall traffic volume and Average Packet Latency. Moths reduces on-chip network Latency by up to 28.4% (18.0% on Average) and traffic volume by up to 24.9% (20.6% on Average) across a variety of commercial and scientific benchmarks.

  • MICRO - SCARAB: a single cycle adaptive routing and bufferless network
    Proceedings of the 42nd Annual IEEE ACM International Symposium on Microarchitecture - Micro-42, 2009
    Co-Authors: Mitchell Hayenga, Natalie Enright Jerger, Mikko H. Lipasti
    Abstract:

    As technology scaling drives the number of processor cores upward, current on-chip routers consume substantial portions of chip area and power budgets. Since existing research has greatly reduced router Latency overheads and capitalized on available on-chip bandwidth, power constraints dominate interconnection network design. Recently research has proposed bufferless routers as a means to alleviate these constraints, but to date all designs exhibit poor operational frequency, throughput, or Latency. In this paper, we propose an efficient bufferless router which lowers Average Packet Latency by 17.6% and dynamic energy by 18.3% over existing bufferless on-chip network designs. In order to maintain the energy and area benefit of bufferless routers while delivering ultra-low latencies, our router utilizes an opportunistic processor-side buffering technique and an energy-efficient circuit-switched network for delivering negative acknowledgments for dropped Packets.