Memcached

The Experts below are selected from a list of 1287 Experts worldwide ranked by ideXlab platform

Thomas F. Wenisch - One of the best experts on this subject based on the ideXlab platform.

thin servers with smart pipes designing soc accelerators for Memcached

International Symposium on Computer Architecture, 2013

Co-Authors: Kevin T. Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, Thomas F. Wenisch

Abstract:

Distributed in-memory key-value stores, such as Memcached, are central to the scalability of modern internet services. Current deployments use commodity servers with high-end processors. However, given the cost-sensitivity of internet services and the recent proliferation of volume low-power System-on-Chip (SoC) designs, we see an opportunity for alternative architectures. We undertake a detailed characterization of Memcached to reveal performance and power inefficiencies. Our study considers both high-performance and low-power CPUs and NICs across a variety of carefully-designed benchmarks that exercise the range of Memcached behavior. We discover that, regardless of CPU microarchitecture, Memcached execution is remarkably inefficient, saturating neither network links nor available memory bandwidth. Instead, we find performance is typically limited by the per-packet processing overheads in the NIC and OS kernel---long code paths limit CPU performance due to poor branch predictability and instruction fetch bottlenecks. Our insights suggest that neither high-performance nor low-power cores provide a satisfactory power-performance trade-off, and point to a need for tighter integration of the network interface. Hence, we argue for an alternate architecture---Thin Servers with Smart Pipes (TSSP)---for cost-effective high-performance Memcached deployment. TSSP couples an embedded-class low-power core to a Memcached accelerator that can process GET requests entirely in hardware, offloading both network handling and data look up. We demonstrate the potential benefits of our TSSP architecture through an FPGA prototyping platform, and show the potential for a 6X-16X power-performance improvement over conventional server baselines.

15 days free trial to Access Article
ISCA - Thin servers with smart pipes: designing SoC accelerators for Memcached

Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13, 2013

Co-Authors: Kevin T. Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, Thomas F. Wenisch

Abstract:

Distributed in-memory key-value stores, such as Memcached, are central to the scalability of modern internet services. Current deployments use commodity servers with high-end processors. However, given the cost-sensitivity of internet services and the recent proliferation of volume low-power System-on-Chip (SoC) designs, we see an opportunity for alternative architectures. We undertake a detailed characterization of Memcached to reveal performance and power inefficiencies. Our study considers both high-performance and low-power CPUs and NICs across a variety of carefully-designed benchmarks that exercise the range of Memcached behavior. We discover that, regardless of CPU microarchitecture, Memcached execution is remarkably inefficient, saturating neither network links nor available memory bandwidth. Instead, we find performance is typically limited by the per-packet processing overheads in the NIC and OS kernel---long code paths limit CPU performance due to poor branch predictability and instruction fetch bottlenecks. Our insights suggest that neither high-performance nor low-power cores provide a satisfactory power-performance trade-off, and point to a need for tighter integration of the network interface. Hence, we argue for an alternate architecture---Thin Servers with Smart Pipes (TSSP)---for cost-effective high-performance Memcached deployment. TSSP couples an embedded-class low-power core to a Memcached accelerator that can process GET requests entirely in hardware, offloading both network handling and data look up. We demonstrate the potential benefits of our TSSP architecture through an FPGA prototyping platform, and show the potential for a 6X-16X power-performance improvement over conventional server baselines.

15 days free trial to Access Article

Kevin T. Lim - One of the best experts on this subject based on the ideXlab platform.

thin servers with smart pipes designing soc accelerators for Memcached

International Symposium on Computer Architecture, 2013

Co-Authors: Kevin T. Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, Thomas F. Wenisch

Abstract:

Distributed in-memory key-value stores, such as Memcached, are central to the scalability of modern internet services. Current deployments use commodity servers with high-end processors. However, given the cost-sensitivity of internet services and the recent proliferation of volume low-power System-on-Chip (SoC) designs, we see an opportunity for alternative architectures. We undertake a detailed characterization of Memcached to reveal performance and power inefficiencies. Our study considers both high-performance and low-power CPUs and NICs across a variety of carefully-designed benchmarks that exercise the range of Memcached behavior. We discover that, regardless of CPU microarchitecture, Memcached execution is remarkably inefficient, saturating neither network links nor available memory bandwidth. Instead, we find performance is typically limited by the per-packet processing overheads in the NIC and OS kernel---long code paths limit CPU performance due to poor branch predictability and instruction fetch bottlenecks. Our insights suggest that neither high-performance nor low-power cores provide a satisfactory power-performance trade-off, and point to a need for tighter integration of the network interface. Hence, we argue for an alternate architecture---Thin Servers with Smart Pipes (TSSP)---for cost-effective high-performance Memcached deployment. TSSP couples an embedded-class low-power core to a Memcached accelerator that can process GET requests entirely in hardware, offloading both network handling and data look up. We demonstrate the potential benefits of our TSSP architecture through an FPGA prototyping platform, and show the potential for a 6X-16X power-performance improvement over conventional server baselines.

15 days free trial to Access Article
FPGA - An FPGA Memcached appliance

Proceedings of the ACM SIGDA international symposium on Field programmable gate arrays - FPGA '13, 2013

Co-Authors: Sai Rahul Chalamalasetti, Kevin T. Lim, Parthasarathy Ranganathan, Mitch Wright, Alvin Auyoung, Martin Margala

Abstract:

Providing low-latency access to large amounts of data is one of the foremost requirements for many web services. To address these needs, systems such as Memcached have been created which provide a distributed, all in-memory key-value store. These systems are critical and often deployed across hundreds or thousands of servers. However, these systems are not well matched for commodity servers, as they require significant CPU resources to achieve reasonable network bandwidth, yet the core Memcached functions do not benefit from the high performance of standard server CPUs. In this paper, we demonstrate the design of an FPGA-based Memcached appliance. We take Memcached, a complex software system, and implement its core functionality on an FPGA. By leveraging the FPGA's design and utilizing its customizable logic to create a specialized appliance we are able to tightly integrate networking, compute, and memory. This integration allows us to overcome many of the bottlenecks found in standard servers. Our design provides performance on-par with baseline servers, but consumes only 9% of the power of the baseline. Scaled out, we see benefits at the data center level, substantially improving the performance-per-dollar while improving energy efficiency by 3.2X to 10.9X.

15 days free trial to Access Article
an fpga Memcached appliance

Field Programmable Gate Arrays, 2013

Co-Authors: Sai Rahul Chalamalasetti, Kevin T. Lim, Parthasarathy Ranganathan, Mitch Wright, Alvin Auyoung, Martin Margala

Abstract:

Providing low-latency access to large amounts of data is one of the foremost requirements for many web services. To address these needs, systems such as Memcached have been created which provide a distributed, all in-memory key-value store. These systems are critical and often deployed across hundreds or thousands of servers. However, these systems are not well matched for commodity servers, as they require significant CPU resources to achieve reasonable network bandwidth, yet the core Memcached functions do not benefit from the high performance of standard server CPUs. In this paper, we demonstrate the design of an FPGA-based Memcached appliance. We take Memcached, a complex software system, and implement its core functionality on an FPGA. By leveraging the FPGA's design and utilizing its customizable logic to create a specialized appliance we are able to tightly integrate networking, compute, and memory. This integration allows us to overcome many of the bottlenecks found in standard servers. Our design provides performance on-par with baseline servers, but consumes only 9% of the power of the baseline. Scaled out, we see benefits at the data center level, substantially improving the performance-per-dollar while improving energy efficiency by 3.2X to 10.9X.

15 days free trial to Access Article
ISCA - Thin servers with smart pipes: designing SoC accelerators for Memcached

Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13, 2013

Co-Authors: Kevin T. Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, Thomas F. Wenisch

Abstract:

Distributed in-memory key-value stores, such as Memcached, are central to the scalability of modern internet services. Current deployments use commodity servers with high-end processors. However, given the cost-sensitivity of internet services and the recent proliferation of volume low-power System-on-Chip (SoC) designs, we see an opportunity for alternative architectures. We undertake a detailed characterization of Memcached to reveal performance and power inefficiencies. Our study considers both high-performance and low-power CPUs and NICs across a variety of carefully-designed benchmarks that exercise the range of Memcached behavior. We discover that, regardless of CPU microarchitecture, Memcached execution is remarkably inefficient, saturating neither network links nor available memory bandwidth. Instead, we find performance is typically limited by the per-packet processing overheads in the NIC and OS kernel---long code paths limit CPU performance due to poor branch predictability and instruction fetch bottlenecks. Our insights suggest that neither high-performance nor low-power cores provide a satisfactory power-performance trade-off, and point to a need for tighter integration of the network interface. Hence, we argue for an alternate architecture---Thin Servers with Smart Pipes (TSSP)---for cost-effective high-performance Memcached deployment. TSSP couples an embedded-class low-power core to a Memcached accelerator that can process GET requests entirely in hardware, offloading both network handling and data look up. We demonstrate the potential benefits of our TSSP architecture through an FPGA prototyping platform, and show the potential for a 6X-16X power-performance improvement over conventional server baselines.

15 days free trial to Access Article

Dhabaleswar K. Panda - One of the best experts on this subject based on the ideXlab platform.

IPDPS - High-Performance Hybrid Key-Value Store on Modern Clusters with RDMA Interconnects and SSDs: Non-blocking Extensions, Designs, and Benefits

2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016

Co-Authors: Dipti Shankar, Wasi-ur-rahman, Nusrat Sharmin Islam, Dhabaleswar K. Panda

Abstract:

High-performance, distributed key-value store-based caching solutions, such as Memcached, have played a crucial role in enhancing the performance of many Online and Offline Big Data applications. The advent of high-performance storage (e.g. NVMe SSD) and interconnects (e.g. InfiniBand) on modern clusters has directed several efforts towards employing 'RAM+SSD' hybrid storagearchitectures for key-value stores running over RDMA, in order to achieve high data retention, while maintaining low latency and high throughput. In this paper, we first perform a detailed analysis of the behavior of hybrid Memcached designs, and identify two major bottlenecks: the client-side wait for request completion and the server-side SSD I/O overhead. Based on this analysis, we propose new non-blocking API extensions for Memcached Set and Get operations, to support high data retention while trying to achieve near in-memory speeds. We enhance the existing runtime designs on both the client and the server, and propose an adaptive slab manager with different I/O schemes for higher throughput. We demonstrate that LibMemcached-based applications can achieve high performance by exploiting the communication/computation overlap that is made possible by the proposed non-blocking API extensions, with either In-memory or SSD-assisted designs of RDMA-based Memcached. Performance evaluations show that the proposed extensions and designs can achieve up to 16x improvement for Memcached Set/Get latency over current hybrid design for RDMA-Memcached when all data does not fit in memory, and up to 3.6x improvement over pure in-memory design of default Memcached over 'IP-over-IB' when all data can fit in memory.

15 days free trial to Access Article
can rdma benefit online data processing workloads on Memcached and mysql

International Symposium on Performance Analysis of Systems and Software, 2015

Co-Authors: Dipti Shankar, Jithin Jose, Nusrat Sharmin Islam, Md Wasiurrahman, Dhabaleswar K. Panda

Abstract:

At the onset of the widespread usage of social networking services in the Web 2.0/3.0 era, leveraging a distributed and scalable caching layer like Memcached is often invaluable to application server performance. Since a majority of the existing clusters today are equipped with modern high speed interconnects such as InfiniBand, that offer high bandwidth and low latency communication, there is potential to improve the response time and throughput of the application servers, by taking advantage of advanced features like RDMA. We explore the potential of employing RDMA to improve the performance of Online Data Processing (OLDP) workloads on MySQL using Memcached for real-world web applications.

15 days free trial to Access Article
PABS@ICPE - Accelerating Big Data Processing on Modern Clusters

Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems - PABS '15, 2015

Co-Authors: Dhabaleswar K. Panda

Abstract:

Modern clusters are having multi-/many-core architectures, high-performance rdma-enabled interconnects and SSD-based storage devices. Hadoop framework is extensively being used these days for Big Data processing. Spark framework is emerging for real-time analytics. Similarly, Memcached is being used in data centers with Web 2.0 environment. This talk will provide an overview of challenges in accelerating Hadoop, Spark and Memcached on modern clusters. An overview of RDMA-based designs for multiple components of Hadoop (HDFS, MapReduce, RPC and HBase), Spark and Memcached will be presented. Performance benefits of these designs on various cluster configurations will be shown. The talk will also address the need for designing benchmarks using a multi-layered and systematic approach, which can be used to evaluate the performance of these middleware.

15 days free trial to Access Article
ISPASS - Can RDMA benefit online data processing workloads on Memcached and MySQL

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2015

Co-Authors: Dipti Shankar, Jithin Jose, Wasi-ur-rahman, Nusrat Sharmin Islam, Dhabaleswar K. Panda

Abstract:

At the onset of the widespread usage of social networking services in the Web 2.0/3.0 era, leveraging a distributed and scalable caching layer like Memcached is often invaluable to application server performance. Since a majority of the existing clusters today are equipped with modern high speed interconnects such as InfiniBand, that offer high bandwidth and low latency communication, there is potential to improve the response time and throughput of the application servers, by taking advantage of advanced features like RDMA. We explore the potential of employing RDMA to improve the performance of Online Data Processing (OLDP) workloads on MySQL using Memcached for real-world web applications.

15 days free trial to Access Article
Big Data - Benchmarking key-value stores on high-performance storage and interconnects for web-scale workloads

2015 IEEE International Conference on Big Data (Big Data), 2015

Co-Authors: Dipti Shankar, Wasi-ur-rahman, Nusrat Sharmin Islam, Dhabaleswar K. Panda

Abstract:

Leveraging a distributed key-value based caching layer has proven to be invaluable for scalable data-intensive web applications. With the emergence of high-performance storage (e.g. SSD) and interconnects (e.g. InfiniBand) on modern clusters, several efforts are being made to design high-performance key-value stores that can operate well with ‘RAM+SSD’ hybrid storage architecture. This has made it essential for us to design micro-benchmarks that are tailored to evaluate these upcoming, hybrid designs. In this paper, we study popular web-scale and cloud serving workloads, to identify different application-specific aspects, including commonly occurring data request distributions, update patterns, and environmental factors, that affect the performance of hybrid key-value stores. Based on these characterization studies, we propose a micro-benchmark suite that can be used to study high-performance, hybrid key-value stores on modern clusters, from the perspectives of both the application and the key-value store. We demonstrate its ease-of-use using database-integrated and stand-alone execution modes. Performance evaluations with different Memcached distributions, such as SSD-Assisted RDMA-Memcached, fatcache, and twemcache, over different networks/protocols, show that ‘SSD+RDMA’ can significantly enhance the performance of Memcached for various read-only and read-heavy workloads, that are representative of several common web-scale workloads.

15 days free trial to Access Article

Paul Lu - One of the best experts on this subject based on the ideXlab platform.

Low-Latency Caching for Cloud-Based Web Applications

Work, 2011

Co-Authors: Adam Wolfe Gordon, Paul Lu

Abstract:

Many Web applications are now hosted in elastic cloud en- vironments where the unit of resource allocation is a virtual machine (VM) instance; entire VMs are added or removed to scale up or scale down. A variety of techniques can reduce the latency of communication between VMs co-located on the same server in, say, a private cloud. For example, par- avirtualized network mechanisms (e.g., vhost and virtio in Linux KVM) can optimize the number of protection bound- ary crossings. Inter-VM shared memory can further reduce boundary crossings after setting up a shared region. We present the design, implementation, and an evalua- tion of Nahanni Memcached, a port of the well-known mem- cached that uses inter-VM shared memory instead of a vir- tual network for cache reads. As a widely deployed cache for back-end datastores and databases, Memcacheds latency is important to the performance of many well-known web sites (e.g., Facebook, Twitter) and cloud platforms (e.g., Googles App Engine). Although using shared-memory IPC is a well-known strategy, the recent introduction of the ivsh- mem inter-VM shared memory mechanism (also known as Nahanni) to Linux KVM makes the strategy practical for virtual machines. Using the Yahoo Cloud Serving Bench- mark, we confirm the intuition that Nahanni Memcached can reduce the latency of cache read operations by up to 86%, and that given reasonable hit rates, this can reduce the total latency of read-related operations for a workload by up to 45% compared to standard Memcached. When using the experimental paravirtualized vhost networking mechanism in Linux KVM, Nahanni Memcached offers a smaller, but still significant, advantage of 29%.

15 days free trial to Access Article

David Meisner - One of the best experts on this subject based on the ideXlab platform.

thin servers with smart pipes designing soc accelerators for Memcached

International Symposium on Computer Architecture, 2013

Co-Authors: Kevin T. Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, Thomas F. Wenisch

Abstract:

Distributed in-memory key-value stores, such as Memcached, are central to the scalability of modern internet services. Current deployments use commodity servers with high-end processors. However, given the cost-sensitivity of internet services and the recent proliferation of volume low-power System-on-Chip (SoC) designs, we see an opportunity for alternative architectures. We undertake a detailed characterization of Memcached to reveal performance and power inefficiencies. Our study considers both high-performance and low-power CPUs and NICs across a variety of carefully-designed benchmarks that exercise the range of Memcached behavior. We discover that, regardless of CPU microarchitecture, Memcached execution is remarkably inefficient, saturating neither network links nor available memory bandwidth. Instead, we find performance is typically limited by the per-packet processing overheads in the NIC and OS kernel---long code paths limit CPU performance due to poor branch predictability and instruction fetch bottlenecks. Our insights suggest that neither high-performance nor low-power cores provide a satisfactory power-performance trade-off, and point to a need for tighter integration of the network interface. Hence, we argue for an alternate architecture---Thin Servers with Smart Pipes (TSSP)---for cost-effective high-performance Memcached deployment. TSSP couples an embedded-class low-power core to a Memcached accelerator that can process GET requests entirely in hardware, offloading both network handling and data look up. We demonstrate the potential benefits of our TSSP architecture through an FPGA prototyping platform, and show the potential for a 6X-16X power-performance improvement over conventional server baselines.

15 days free trial to Access Article
ISCA - Thin servers with smart pipes: designing SoC accelerators for Memcached

Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13, 2013

Co-Authors: Kevin T. Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, Thomas F. Wenisch

Abstract:

Distributed in-memory key-value stores, such as Memcached, are central to the scalability of modern internet services. Current deployments use commodity servers with high-end processors. However, given the cost-sensitivity of internet services and the recent proliferation of volume low-power System-on-Chip (SoC) designs, we see an opportunity for alternative architectures. We undertake a detailed characterization of Memcached to reveal performance and power inefficiencies. Our study considers both high-performance and low-power CPUs and NICs across a variety of carefully-designed benchmarks that exercise the range of Memcached behavior. We discover that, regardless of CPU microarchitecture, Memcached execution is remarkably inefficient, saturating neither network links nor available memory bandwidth. Instead, we find performance is typically limited by the per-packet processing overheads in the NIC and OS kernel---long code paths limit CPU performance due to poor branch predictability and instruction fetch bottlenecks. Our insights suggest that neither high-performance nor low-power cores provide a satisfactory power-performance trade-off, and point to a need for tighter integration of the network interface. Hence, we argue for an alternate architecture---Thin Servers with Smart Pipes (TSSP)---for cost-effective high-performance Memcached deployment. TSSP couples an embedded-class low-power core to a Memcached accelerator that can process GET requests entirely in hardware, offloading both network handling and data look up. We demonstrate the potential benefits of our TSSP architecture through an FPGA prototyping platform, and show the potential for a 6X-16X power-performance improvement over conventional server baselines.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Thomas F. Wenisch - One of the best experts on this subject based on the ideXlab platform.

thin servers with smart pipes designing soc accelerators for Memcached

ISCA - Thin servers with smart pipes: designing SoC accelerators for Memcached

Kevin T. Lim - One of the best experts on this subject based on the ideXlab platform.

thin servers with smart pipes designing soc accelerators for Memcached

FPGA - An FPGA Memcached appliance

an fpga Memcached appliance

ISCA - Thin servers with smart pipes: designing SoC accelerators for Memcached

Dhabaleswar K. Panda - One of the best experts on this subject based on the ideXlab platform.

IPDPS - High-Performance Hybrid Key-Value Store on Modern Clusters with RDMA Interconnects and SSDs: Non-blocking Extensions, Designs, and Benefits

can rdma benefit online data processing workloads on Memcached and mysql

PABS@ICPE - Accelerating Big Data Processing on Modern Clusters

ISPASS - Can RDMA benefit online data processing workloads on Memcached and MySQL

Big Data - Benchmarking key-value stores on high-performance storage and interconnects for web-scale workloads

Paul Lu - One of the best experts on this subject based on the ideXlab platform.

Low-Latency Caching for Cloud-Based Web Applications

David Meisner - One of the best experts on this subject based on the ideXlab platform.

thin servers with smart pipes designing soc accelerators for Memcached

ISCA - Thin servers with smart pipes: designing SoC accelerators for Memcached

Memcached

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Related terms

Thomas F. Wenisch - One of the best experts on this subject based on the ideXlab platform.

Kevin T. Lim - One of the best experts on this subject based on the ideXlab platform.

Dhabaleswar K. Panda - One of the best experts on this subject based on the ideXlab platform.

Paul Lu - One of the best experts on this subject based on the ideXlab platform.

David Meisner - One of the best experts on this subject based on the ideXlab platform.

Related terms