Server Utilization

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 11049 Experts worldwide ranked by ideXlab platform

Sujin Ha - One of the best experts on this subject based on the ideXlab platform.

  • OMBM: optimized memory bandwidth management for ensuring QoS and high Server Utilization
    Cluster Computing, 2018
    Co-Authors: Hanul Sung, Sujin Ha
    Abstract:

    Latency-critical workloads such as web search engines, social networks and finance market applications are sensitive to tail latencies for meeting service level objectives (SLOs). Since unexpected tail latencies are caused by sharing hardware resources with other co-executing workloads, a service provider executes the latency-critical workload alone. Thus, the data center for the latency-critical workloads has exceedingly low hardware resource Utilization. For improving hardware resource Utilization, the service provider has to co-locate the latency-critical workloads and other batch processing ones. However, because the memory bandwidth cannot be provided in isolation unlike the cores and cache memory, the latency-critical workloads experience poor performance isolation even though the core and cache memory are allocated in isolation to the workloads. To solve this problem, we propose an optimized memory bandwidth management approach for ensuring quality of service (QoS) and high Server Utilization. By providing isolated shared resources including the memory bandwidth to the latency-critical workload and co-executing batch processing ones, firstly, our proposed approach performs few pre-profilings under the assumption that memory bandwidth contention is the worst with a divide and conquer method. Second, we predict the memory bandwidth to meet the SLO for all queries per seconds (QPSs) based on results of the pre-profilings. Then, our approach allocates the amount of the isolated memory bandwidth that guarantees the SLO to the latency-critical workload and the rest of the memory bandwidth to co-executing batch processing ones. It is experimentally found that our proposed approach can achieve up to 99% SLO assurance and improve the Server Utilization up to 6.5×.

  • OMBM: Optimized Memory Bandwidth Management for Ensuring QoS and High Server Utilization
    2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W), 2017
    Co-Authors: Hanul Sung, Sujin Ha
    Abstract:

    Latency-critical workloads such as web search engines, social networks and finance market applications are sensitive to tail latencies for meeting Service Level Objectives (SLOs). Since unexpected tail latencies are caused by sharing hardware resources with other co-executing workloads, a service provider executes the latency-critical workload alone. Thus, the data center for the latency-critical workloads has exceedingly low hardware resource Utilization. For improving hardware resource Utilization, the service provider has to co-locate the latency-critical workloads and other batch processing workloads. However, because memory bandwidth cannot be provided in isolation unlike cores and cache memory, the latency-critical workloads experience poor performance isolation even though the core and the cache memory are allocated in isolation to the workloads. To solve this problem, we propose an optimized memory bandwidth management approach for ensuring Quality of Service (QoS) and high Server Utilization. By providing isolated shared resources including memory bandwidth to the latency-critical workload and co-executing batch processing ones, our proposed approach guarantees SLOs and improves hardware resource Utilization. Firstly, we predict the size of the memory bandwidth to meet the SLO for all Queries Per Seconds (QPSs) while executing the latency-critical workload with minimal pre-profiling. Then, our approach allocates the amount of the isolated memory bandwidth that guarantees the SLO to the latency-critical workload and the rest of the memory bandwidth to co-executing batch processing workloads. As a result, our proposed approach can achieve up to 99% SLO assurance and improve Server Utilization up to 6.5x.

Hanul Sung - One of the best experts on this subject based on the ideXlab platform.

  • OMBM: optimized memory bandwidth management for ensuring QoS and high Server Utilization
    Cluster Computing, 2018
    Co-Authors: Hanul Sung, Sujin Ha
    Abstract:

    Latency-critical workloads such as web search engines, social networks and finance market applications are sensitive to tail latencies for meeting service level objectives (SLOs). Since unexpected tail latencies are caused by sharing hardware resources with other co-executing workloads, a service provider executes the latency-critical workload alone. Thus, the data center for the latency-critical workloads has exceedingly low hardware resource Utilization. For improving hardware resource Utilization, the service provider has to co-locate the latency-critical workloads and other batch processing ones. However, because the memory bandwidth cannot be provided in isolation unlike the cores and cache memory, the latency-critical workloads experience poor performance isolation even though the core and cache memory are allocated in isolation to the workloads. To solve this problem, we propose an optimized memory bandwidth management approach for ensuring quality of service (QoS) and high Server Utilization. By providing isolated shared resources including the memory bandwidth to the latency-critical workload and co-executing batch processing ones, firstly, our proposed approach performs few pre-profilings under the assumption that memory bandwidth contention is the worst with a divide and conquer method. Second, we predict the memory bandwidth to meet the SLO for all queries per seconds (QPSs) based on results of the pre-profilings. Then, our approach allocates the amount of the isolated memory bandwidth that guarantees the SLO to the latency-critical workload and the rest of the memory bandwidth to co-executing batch processing ones. It is experimentally found that our proposed approach can achieve up to 99% SLO assurance and improve the Server Utilization up to 6.5×.

  • OMBM: Optimized Memory Bandwidth Management for Ensuring QoS and High Server Utilization
    2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W), 2017
    Co-Authors: Hanul Sung, Sujin Ha
    Abstract:

    Latency-critical workloads such as web search engines, social networks and finance market applications are sensitive to tail latencies for meeting Service Level Objectives (SLOs). Since unexpected tail latencies are caused by sharing hardware resources with other co-executing workloads, a service provider executes the latency-critical workload alone. Thus, the data center for the latency-critical workloads has exceedingly low hardware resource Utilization. For improving hardware resource Utilization, the service provider has to co-locate the latency-critical workloads and other batch processing workloads. However, because memory bandwidth cannot be provided in isolation unlike cores and cache memory, the latency-critical workloads experience poor performance isolation even though the core and the cache memory are allocated in isolation to the workloads. To solve this problem, we propose an optimized memory bandwidth management approach for ensuring Quality of Service (QoS) and high Server Utilization. By providing isolated shared resources including memory bandwidth to the latency-critical workload and co-executing batch processing ones, our proposed approach guarantees SLOs and improves hardware resource Utilization. Firstly, we predict the size of the memory bandwidth to meet the SLO for all Queries Per Seconds (QPSs) while executing the latency-critical workload with minimal pre-profiling. Then, our approach allocates the amount of the isolated memory bandwidth that guarantees the SLO to the latency-critical workload and the rest of the memory bandwidth to co-executing batch processing workloads. As a result, our proposed approach can achieve up to 99% SLO assurance and improve Server Utilization up to 6.5x.

Lingjia Tang - One of the best experts on this subject based on the ideXlab platform.

  • smite precise qos prediction on real system smt processors to improve Utilization in warehouse scale computers
    International Symposium on Microarchitecture, 2014
    Co-Authors: Yunqi Zhang, Jason Mars, Michael A Laurenzano, Lingjia Tang
    Abstract:

    One of the key challenges for improving efficiency in warehouse scale computers (WSCs) is to improve Server Utilization while guaranteeing the quality of service (QoS) of latency-sensitive applications. To this end, prior work has proposed techniques to precisely predict performance and QoS interference to identify 'safe' application co-locations. However, such techniques are only applicable to resources shared across cores. Achieving such precise interference prediction on real-system simultaneous multithreading (SMT) architectures has been a significantly challenging open problem due to the complexity introduced by sharing resources within a core. In this paper, we demonstrate through a real-system investigation that the fundamental difference between resource sharing behaviors on CMP and SMT architectures calls for a redesign of the way we model interference. For SMT Servers, the interference on different shared resources, including private caches, memory ports, as well as integer and floating-point functional units, do not correlate with each other. This insight suggests the necessity of decoupling interference into multiple resource sharing dimensions. In this work, we propose SMiTe, a methodology that enables precise performance prediction for SMT co-location on real-system commodity processors. With a set of Rulers, which are carefully designed software stressors that apply pressure to a multidimensional space of shared resources, we quantify application sensitivity and contentiousness in a decoupled manner. We then establish a regression model to combine the sensitivity and contentiousness in different dimensions to predict performance interference. Using this methodology, we are able to precisely predict the performance interference in SMT co-location with an average error of 2.80% on SPEC CPU2006 and 1.79% on Cloud Suite. Our evaluation shows that SMiTe allows us to improve the Utilization of WSCs by up to 42.57% while enforcing an application's QoS requirements.

  • bubble flux precise online qos management for increased Utilization in warehouse scale computers
    International Symposium on Computer Architecture, 2013
    Co-Authors: Hailong Yang, Alexander D Breslow, Jason Mars, Lingjia Tang
    Abstract:

    Ensuring the quality of service (QoS) for latency-sensitive applications while allowing co-locations of multiple applications on Servers is critical for improving Server Utilization and reducing cost in modern warehouse-scale computers (WSCs). Recent work relies on static profiling to precisely predict the QoS degradation that results from performance interference among co-running applications to increase the number of "safe" co-locations. However, these static profiling techniques have several critical limitations: 1) a priori knowledge of all workloads is required for profiling, 2) it is difficult for the prediction to capture or adapt to phase or load changes of applications, and 3) the prediction technique is limited to only two co-running applications. To address all of these limitations, we present Bubble-Flux, an integrated dynamic interference measurement and online QoS management mechanism to provide accurate QoS control and maximize Server Utilization. Bubble-Flux uses a Dynamic Bubble to probe Servers in real time to measure the instantaneous pressure on the shared hardware resources and precisely predict how the QoS of a latency-sensitive job will be affected by potential co-runners. Once "safe" batch jobs are selected and mapped to a Server, Bubble-Flux uses an Online Flux Engine to continuously monitor the QoS of the latency-sensitive application and control the execution of batch jobs to adapt to dynamic input, phase, and load changes to deliver satisfactory QoS. Batch applications remain in a state of flux throughout execution. Our results show that the Utilization improvement achieved by Bubble-Flux is up to 2.2x better than the prior static approach.

Guofei Jiang - One of the best experts on this subject based on the ideXlab platform.

  • Power and performance management of virtualized computing environments via lookahead control
    Cluster Computing, 2009
    Co-Authors: Dara Marie Kusic, Nagarajan Kandasamy, James E Hanson, Jeffrey O Kephart, Guofei Jiang
    Abstract:

    There is growing incentive to reduce the power consumed by large-scale data centers that host online services such as banking, retail commerce, and gaming. Virtualization is a promising approach to consolidating multiple online services onto a smaller number of computing resources. A virtualized Server environment allows computing resources to be shared among multiple performance-isolated platforms called virtual machines. By dynamically provisioning virtual machines, consolidating the workload, and turning Servers on and off as needed, data center operators can maintain the desired quality-of-service (QoS) while achieving higher Server Utilization and energy efficiency. We implement and validate a dynamic resource provisioning framework for virtualized Server environments wherein the provisioning problem is posed as one of sequential optimization under uncertainty and solved using a lookahead control scheme. The proposed approach accounts for the switching costs incurred while provisioning virtual machines and explicitly encodes the corresponding risk in the optimization problem. Experiments using the Trade6 enterprise application show that a Server cluster managed by the controller conserves, on average, 22% of the power required by a system without dynamic control while still maintaining QoS goals. Finally, we use trace-based simulations to analyze controller performance on Server clusters larger than our testbed, and show how concepts from approximation theory can be used to further reduce the computational burden of controlling large systems.

  • Power and Performance Management of Virtualized Computing Environments Via Lookahead Control
    2008 International Conference on Autonomic Computing, 2008
    Co-Authors: Dara Marie Kusic, Nagarajan Kandasamy, James E Hanson, Jeffrey O Kephart, Guofei Jiang
    Abstract:

    There is growing incentive to reduce the power consumed by large-scale data centers that host online services such as banking, retail commerce, and gaming. Virtualization is a promising approach to consolidating multiple online services onto a smaller number of computing resources. A virtualized Server environment allows computing resources to be shared among multiple performance-isolated platforms called virtual machines. By dynamically provisioning virtual machines, consolidating the workload, and turning Servers on and off as needed, data center operators can maintain the desired quality-of-service (QoS) while achieving higher Server Utilization and energy efficiency. We implement and validate a dynamic resource provisioning framework for virtualized Server environments wherein the provisioning problem is posed as one of sequential optimization under uncertainty and solved using a lookahead control scheme. The proposed approach accounts for the switching costs incurred while provisioning virtual machines and explicitly encodes the corresponding risk in the optimization problem. Experiments using the Trade6 enterprise application show that a Server cluster managed by the controller conserves, on average, 26% of the power required by a system without dynamic control while still maintaining QoS goals.

Dara Marie Kusic - One of the best experts on this subject based on the ideXlab platform.

  • Power and performance management of virtualized computing environments via lookahead control
    Cluster Computing, 2009
    Co-Authors: Dara Marie Kusic, Nagarajan Kandasamy, James E Hanson, Jeffrey O Kephart, Guofei Jiang
    Abstract:

    There is growing incentive to reduce the power consumed by large-scale data centers that host online services such as banking, retail commerce, and gaming. Virtualization is a promising approach to consolidating multiple online services onto a smaller number of computing resources. A virtualized Server environment allows computing resources to be shared among multiple performance-isolated platforms called virtual machines. By dynamically provisioning virtual machines, consolidating the workload, and turning Servers on and off as needed, data center operators can maintain the desired quality-of-service (QoS) while achieving higher Server Utilization and energy efficiency. We implement and validate a dynamic resource provisioning framework for virtualized Server environments wherein the provisioning problem is posed as one of sequential optimization under uncertainty and solved using a lookahead control scheme. The proposed approach accounts for the switching costs incurred while provisioning virtual machines and explicitly encodes the corresponding risk in the optimization problem. Experiments using the Trade6 enterprise application show that a Server cluster managed by the controller conserves, on average, 22% of the power required by a system without dynamic control while still maintaining QoS goals. Finally, we use trace-based simulations to analyze controller performance on Server clusters larger than our testbed, and show how concepts from approximation theory can be used to further reduce the computational burden of controlling large systems.

  • Power and Performance Management of Virtualized Computing Environments Via Lookahead Control
    2008 International Conference on Autonomic Computing, 2008
    Co-Authors: Dara Marie Kusic, Nagarajan Kandasamy, James E Hanson, Jeffrey O Kephart, Guofei Jiang
    Abstract:

    There is growing incentive to reduce the power consumed by large-scale data centers that host online services such as banking, retail commerce, and gaming. Virtualization is a promising approach to consolidating multiple online services onto a smaller number of computing resources. A virtualized Server environment allows computing resources to be shared among multiple performance-isolated platforms called virtual machines. By dynamically provisioning virtual machines, consolidating the workload, and turning Servers on and off as needed, data center operators can maintain the desired quality-of-service (QoS) while achieving higher Server Utilization and energy efficiency. We implement and validate a dynamic resource provisioning framework for virtualized Server environments wherein the provisioning problem is posed as one of sequential optimization under uncertainty and solved using a lookahead control scheme. The proposed approach accounts for the switching costs incurred while provisioning virtual machines and explicitly encodes the corresponding risk in the optimization problem. Experiments using the Trade6 enterprise application show that a Server cluster managed by the controller conserves, on average, 26% of the power required by a system without dynamic control while still maintaining QoS goals.