network processor

The Experts below are selected from a list of 41376 Experts worldwide ranked by ideXlab platform

Patrick Crowley - One of the best experts on this subject based on the ideXlab platform.

Characterizing processor Architectures for Programmable network

2016

Co-Authors: Patrick Crowley, Marc E. Fiuczynski, Brian N. Bershad

Abstract:

Abstract: The rapid advancements of networking technology have boosted potential bandwidth to the point that the cabling is no longer the bottleneck. Rather, it is at the crossing points, the nodes of the network, where data traffic is intercepted or forwarded. As a result, there has been tremendous interest in speeding those nodes, making the equipment run faster by means of specialized chips to handle data trafficking. The network processor is the blanket name thrown over such chips in their varied forms. To date, no performance data exist to aid in the decision of what processor architecture to use in next generation network processor. Our goal is to, at least partially, remedy this situation. In this study, we characterize both the application workloads that network processors need to support as well as emerging applications that we anticipate may be supported in the future. Then, we consider the performance of three sample benchmarks drawn from these workloads on several state-of-the-art processor architectures, including: an aggressive, out-of-order, speculative super-scalar processor, a fine-grained multithreaded processor, a single chip multiprocessor, and a simultaneous multithreaded processor (SMT). The network interface environment is simulated in detail, and our results indicate that SMT is the architecture best suited to this environment.

15 days free trial to Access Article
virtualization for a network processor runtime system

2006

Co-Authors: Brandon Heller, Jonathan S Turner, John Dehart, Patrick Crowley

Abstract:

The continuing ossification of the Internet is slowing the pace of network innovation. network diversification presents one solution to this problem, by virtualizing the network at multiple layers. Diversified networks consist of a shared physical substrate, virtual routers (metarouters), and virtual links (metalinks). Virtualizing routers enables smooth and incremental upgrades to new network services. Our current priority for a diversified router prototype is to enable reserved slices of the network for researchers to perform repeatable, high-speed network experiments. General-purpose processors have well established techniques for virtualization, but do not scale efficiently to multi-gigabit speeds. To achieve these speeds, we employ network processors (NPs), typically consisting of multicore, multi-threaded processors with asymmetric, heterogeneous memories. The complexity and lack of hardware thread isolation in NP’s, combined with a lack of simple programming models, creates numerous challenges for effective sharing between metarouters. In this paper, we detail strategies for enabling NP virtualization at the link, memory, and processor levels, to better enable a research infrastructure for network innovation. Type of Report: Other Department of Computer Science & Engineering Washington University in St. Louis Campus Box 1045 St. Louis, MO 63130 ph: (314) 935-6160 Virtualization for a network processor Runtime System Brandon Heller, Jonathan Turner, John DeHart, Patrick Crowley Washington University in St. Louis Campus Box 1045 St. Louis, Missouri 63130-4899 bdh4@arl.wustl.edu, jon.turner@wustl.edu, jdd@arl.wustl.edu, pcrowley@wustl.edu

15 days free trial to Access Article
exploiting coarse grained parallelism to accelerate protein motif finding with a network processor

International Conference on Parallel Architectures and Compilation Techniques, 2005

Co-Authors: B Wun, Jeremy Buhler, Patrick Crowley

Abstract:

While general-purpose processors have only recently employed chip multiprocessor (CMP) architectures, network processors (NPs) have used heterogeneous multi-core architectures since the late 1990s. NPs differ qualitatively from workstation and server CMPs in that they replicate many simple, highly efficient processor cores on a chip, rather than a small number of sophisticated superscalar CPUs. In this paper, we compare the performance of one such NP, the Intel IXP 2850, to that of the Intel Pentium 4 when executing a scientific computing workload with a high degree of thread-level parallelism. Our target program, HMMer, is a bioinformatics tool that identifies conserved motifs in protein sequences. HMMer represents motifs as hidden Markov models (HMMs) and spends most of its time executing the well-known Viterbi algorithm to align proteins to these models. Our observations of HMMer on the IXP are therefore relevant to computations in many other domains that rely on the Viterbi algorithm. We show that the IXP achieves a speedup of 1.82 over the Pentium, despite the Pentium's 1.85x faster clock. Moreover, we argue that next-generation IXP NPs will likely provide a 10-20x speedup for our workload over the IXP 2850, in contrast to 5-1Ox speedup expected from a next-generation Pentium-based CMP.

15 days free trial to Access Article
characterizing processor architectures for programmable network interfaces

International Conference on Supercomputing, 2000

Co-Authors: Patrick Crowley, Marc E Fluczynski, Brian N. Bershad

Abstract:

The rapid advancements of networking technology have boosted potential bandwidth to the point that the cabling is no longer the bottleneck. Rather, the bottlenecks lie at the crossing points, the nodes of the network, where data traffic is intercepted or forwarded. As a result, there has been tremendous interest in speeding those nodes, making the equipment run faster by means of specialized chips to handle data trafficking. The network processor is the blanket name thrown over such chips in their varied forms. To date, no performance data exist to aid in the decision of what processor architecture to use in next generation network processor. Our goal is to remedy this situation. In this study, we characterize both the application workloads that network processors need to support as well as emerging applications that we anticipate may be supported in the future. Then, we consider the performance of three sample benchmarks drawn from these workloads on several state-of-the-art processor architectures, including: an aggressive, out-of-order, speculative super-scalar processor, a fine-grained multithreaded processor, a single chip multiprocessor, and a simultaneous multithreaded processor (SMT). The network interface environment is simulated in detail, and our results indicate that SMT is the architecture best suited to this environment.

15 days free trial to Access Article

T N Vijaykumar - One of the best experts on this subject based on the ideXlab platform.

efficient use of memory bandwidth to improve network processor throughput

International Symposium on Computer Architecture, 2003

Co-Authors: Jahangir Hasan, Satish Chandra, T N Vijaykumar

Abstract:

We consider the efficiency of packet buffers used in packet switches built using network processors (NPs). Packet buffers are typically implemented using DRAM, which provides plentiful buffering at a reasonable cost. The problem we address is that a typical NP workload may be unable to utilize the peak DRAM bandwidth. Since the bandwidth of the packet buffer is often the bottleneck in the performance of a shared-memory packet switch, inefficient use of available DRAM bandwidth further reduces the packet throughput. Specialized hardware-based schemes that alleviate the DRAM bandwith problem in high-end routers may be less applicable to NP-based systems, in which cost is an important consideration.In this paper, we propose cost-effective ways to enhance average-case DRAM bandwidth. In modern DRAMs, successive accesses falling within the same DRAM row are significantly faster than those falling across rows. If accesses to DRAM can be generated differently or reordered to take advantage of fast same-row accesses, peak DRAM bandwidth can be approached. The challenge is in exploiting this "row locality" despite the unpredictable nature of memory accesses in NPs. We propose a set of simple techniques to meet this challenge. These include locality-sensitive buffer allocation on packet input, reordering DRAM accesses to increase locality, and prefetching to reduce row miss penalty. We evaluate our techniques on cycle-accurate simulations of Intel's IXP 1200 network processor and find that they boost packet throughput on average by 42.7%, utilizing nearly the peak DRAM bandwidth, for a set of common NP applications processing a real trace.

15 days free trial to Access Article

Kaiming Huang - One of the best experts on this subject based on the ideXlab platform.

towards software based signature detection for intrusion prevention on the network card

Lecture Notes in Computer Science, 2006

Co-Authors: Kaiming Huang

Abstract:

CardGuard is a signature detection system for intrusion detection and prevention that scans the entire payload of packets for suspicious patterns and is implemented in software on a network card equiped with an Intel IXP1200 network processor. One card can be used to protect either a single host, or a small group of machines connected to a switch. CardGuard is non-intrusive in the sense that no cycles of the host CPUs are used for intrusion detection and the system operates at Fast Ethernet link rate. TCP flows are first reconstructed before they are scanned with the Aho-Corasick algorithm.

15 days free trial to Access Article

Amit Singh - One of the best experts on this subject based on the ideXlab platform.

design and implementation of a network processor based 10gbps network traffic generator

Lecture Notes in Computer Science, 2006

Co-Authors: Sanket Shah, Tularam M Bansod, Amit Singh

Abstract:

Testing network processor based high throughput applications require high-speed traffic generator. Commercial traffic generators are very expensive and their internal working is proprietary. Hence, we have designed a network processor based network Traffic Generator (TG). The Control Plane (CP) takes care of the configuration of the traffic profile. The data plane (DP) is responsible for actual generation of the traffic. The TG requires another copy of TG or any other traffic generator for calibration. We explain the calibration methodology and the results of our experiments. Our system has been able to generate traffic up to 10Gbps.

15 days free trial to Access Article

Andreas Herkersdorf - One of the best experts on this subject based on the ideXlab platform.

a folded pipeline network processor architecture for 100 gbit s networks

Architectures for Networking and Communications Systems, 2010

Co-Authors: Kimon Karras, Thomas Wild, Andreas Herkersdorf

Abstract:

Ethernet, although initially conceived as a Local Area network technology, has been steadily making inroads into access and core networks. This has led to a need for higher link speeds, which are now reaching 100 Gbit/s. Packet processing at this rate represents a significant challenge, that needs to be met efficiently, while minimizing power consumption and chip area. This level of throughput favours a pipelined approach, thus this paper takes a traditional pipeline and breaks it down to mini-pipelines, which can perform coarse-grained processing (like process an MPLS label to completion). These mini-pipelines are then parellelized and used to construct a folded pipeline architecture, which augments the traditional approach by significantly reducing power consumption, a key problem in future routers. The paper compares the two approaches, discusses their advantages and disadvantages and demonstrates by quantitative measures that the folded pipeline architecture is the better solution for 100 Gbit/s processing.

15 days free trial to Access Article
flexpath np a network processor concept with application driven flexible processing paths

International Conference on Hardware Software Codesign and System Synthesis, 2005

Co-Authors: Andreas Herkersdorf, Thomas Wild, Rainer Ohlendorf

Abstract:

In this paper, we present a new architectural concept for network processors called FlexPath NP. The central idea behind FlexPath NP is to systematically map network processor (NP) application sub-functions onto both SW programmable processor (CPU) resources and (re-)configurable HW building blocks, such that different packet flows are forwarded via different, optimized processing paths through the NP. Packets with well understood, relatively simple processing requirements may even bypass the central CPU complex (AutoRoute). In consequence, CPU processing resources are more effectively used and the overall NP performance and throughput are improved compared to conventional NP architectures. We present analytical performance estimations to quantify the performance advantage of FlexPath (expressed as available CPU instructions for each packet traversing the CPUs) and introduce a platform-based System on Programmable Chip (SoPC) based architecture which implements the FlexPath NP concept.

15 days free trial to Access Article
performance evaluation of network processor architectures combining simulation with analytical estimation

Computer Networks, 2003

Co-Authors: Samarjit Chakraborty, Simon Kunzli, Andreas Herkersdorf, Lothar Thiele, Patricia M Sagmeister

Abstract:

The designs of most systems-on-a-chip (SoC) architectures rely on simulation as a means for performance estimation. Such designs usually start with a parameterizable template architecture, and the design space exploration is restricted to identifying the suitable parameters for all the architectural components. However, in the case of heterogeneous SoC architectures such as network processors the design space exploration also involves a combinatorial aspect--which architectural components are to be chosen, how should they be interconnected, task mapping decisions--thereby increasing the design space. Moreover, in the case of network processor architectures there is also an associated uncertainty in terms of the application scenario and the traffic it will be required to process. As a result, simulation is no longer a feasible option for evaluating such architectures in any automated or semi-antomated design space exploration process due to the high simulation times involved. To address this problem, in this paper we hypothesize that the design space exploration for network processors should be separated into multiple stages, each having a different level of abstraction. Further, it would be appropriate to use analytical evaluation frameworks during the initial stages and resort to simulation techniques only when a relatively small set of potential architectures is identified. None of the known performance evaluation methods for network processors have been positioned from this perspective.We show that there are already suitable analytical models for network processor performance evaluation which may be used to support our hypothesis. To this end, we choose a reference system-level model of a network processor architecture and compare its performance evaluation results derived using a known analytical model [Thiele et al., Design space exploration of network processor architectures, in: Proc. 1st Workshop on network processors, Cambridge, MA, February 2002; Thiele et al., A framework for evaluating design tradeoffs in packet processing architectures, in: Proc. 39th Design Automation Conference (DAC), New Orleans, USA, ACM Press, 2002] with the results derived by detailed simulation. Based on this comparison, we propose a scheme for the design space exploration of network processor architectures where both analytical performance evaluation techniques and simulation techniques have unique roles to play.

15 days free trial to Access Article
ibm powernp network processor hardware software and applications

Ibm Journal of Research and Development, 2003

Co-Authors: J R Allen, Andreas Herkersdorf, B M Bass, C Basso, R H Boivie, Jean Calvignac, G T Davis, L Frelechoux, M Heddes, A Kind

Abstract:

Deep packet processing is migrating to the edges of service provider networks to simplify and speed up core functions. On the other hand, the cores of such networks are migrating to the switching of high-speed traffic aggregates. As a result, more services will have to be performed at the edges, on behalf of both the core and the end users. Associated network equipment will therefore require high flexibility to support evolving high-level services as well as extraordinary performance to deal with the high packet rates. Whereas, in the past, network equipment was based either on general-purpose processors (GPPs) or application-specific integrated circuits (ASICs), favoring flexibility over speed or vice versa, the network processor approach achieves both flexibility and performance. The key advantage of network processors is that hardware-level performance is complemented by flexible software architecture. This paper provides an overview of the IBM PowerNPTM NP4GS3 network processor and how it addresses these issues. Its hardware and software design characteristics and its comprehensive base operating software make it well suited for a wide range of networking applications.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Patrick Crowley - One of the best experts on this subject based on the ideXlab platform.

Characterizing processor Architectures for Programmable network

virtualization for a network processor runtime system

exploiting coarse grained parallelism to accelerate protein motif finding with a network processor

characterizing processor architectures for programmable network interfaces

T N Vijaykumar - One of the best experts on this subject based on the ideXlab platform.

efficient use of memory bandwidth to improve network processor throughput

Kaiming Huang - One of the best experts on this subject based on the ideXlab platform.

towards software based signature detection for intrusion prevention on the network card

Amit Singh - One of the best experts on this subject based on the ideXlab platform.

design and implementation of a network processor based 10gbps network traffic generator

Andreas Herkersdorf - One of the best experts on this subject based on the ideXlab platform.

a folded pipeline network processor architecture for 100 gbit s networks

flexpath np a network processor concept with application driven flexible processing paths

performance evaluation of network processor architectures combining simulation with analytical estimation

ibm powernp network processor hardware software and applications