Server and Network

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 165 Experts worldwide ranked by ideXlab platform

Arvind Krishnamurthy - One of the best experts on this subject based on the ideXlab platform.

  • parameter hub a rack scale parameter Server for distributed deep neural Network training
    Symposium on Cloud Computing, 2018
    Co-Authors: Liang Luo, Jacob Nelson, Luis Ceze, Amar Phanishayee, Arvind Krishnamurthy
    Abstract:

    Distributed deep neural Network (DDNN) training constitutes an increasingly important workload that frequently runs in the cloud. Larger DNN models and faster compute engines are shifting DDNN training bottlenecks from computation to communication. This paper characterizes DDNN training to precisely pinpoint these bottlenecks. We found that timely training requires high performance parameter Servers (PSs) with optimized Network stacks and gradient processing pipelines, as well as Server and Network hardware with balanced computation and communication resources. We therefore propose PHub, a high performance multi-tenant, rack-scale PS design. PHub co-designs the PS software and hardware to accelerate rack-level and hierarchical cross-rack parameter exchange, with an API compatible with many DDNN training frameworks. PHub provides a performance improvement of up to 2.7x compared to state-of-the-art cloud-based distributed training techniques for image classification workloads, with 25% better throughput per dollar.

  • parameter hub a rack scale parameter Server for distributed deep neural Network training
    arXiv: Distributed Parallel and Cluster Computing, 2018
    Co-Authors: Liang Luo, Jacob Nelson, Luis Ceze, Amar Phanishayee, Arvind Krishnamurthy
    Abstract:

    Distributed deep neural Network (DDNN) training constitutes an increasingly important workload that frequently runs in the cloud. Larger DNN models and faster compute engines are shifting DDNN training bottlenecks from computation to communication. This paper characterizes DDNN training to precisely pinpoint these bottlenecks. We found that timely training requires high performance parameter Servers (PSs) with optimized Network stacks and gradient processing pipelines, as well as Server and Network hardware with balanced computation and communication resources. We therefore propose PHub, a high performance multi-tenant, rack-scale PS design. PHub co-designs the PS software and hardware to accelerate rack-level and hierarchical cross-rack parameter exchange, with an API compatible with many DDNN training frameworks. PHub provides a performance improvement of up to 2.7x compared to state-of-the-art distributed training techniques for cloud-based ImageNet workloads, with 25% better throughput per dollar.

K K Viswanathan - One of the best experts on this subject based on the ideXlab platform.

  • reliable component instance for multi tenant software as a service application
    Computational Science and Engineering, 2019
    Co-Authors: M D Samrajesh, K K Viswanathan
    Abstract:

    Multitenant Software as a Service (SaaS) applications provides high level of application customization, enhanced process flow and access control, using a shared instance of the application. Cloud reliability is complex due to its resource intricacy and massiveness. The proposed solution considers reliability of the application during design and runtime. A reliable application needs to acclimate dynamically to the conditions in the cloud to enhance the reliability of its service. Component Instance (CI) is dynamically created based on the application's load and placed at different availability zones to enhance the reliability of the application. The objective of the proposed solution is to minimize multi-tenant SaaS application reliability risk considering reliability score of datacenter Server energy, Server and Network capacity. Our evaluation and discussions shows that the proposed reliable framework enhances the reliability of the application and reduces the application risk factor.

  • CSE/EUC - Reliable Component Instance for Multi-tenant Software as a Service Application
    2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computi, 2019
    Co-Authors: M D Samrajesh, K K Viswanathan
    Abstract:

    Multitenant Software as a Service (SaaS) applications provides high level of application customization, enhanced process flow and access control, using a shared instance of the application. Cloud reliability is complex due to its resource intricacy and massiveness. The proposed solution considers reliability of the application during design and runtime. A reliable application needs to acclimate dynamically to the conditions in the cloud to enhance the reliability of its service. Component Instance (CI) is dynamically created based on the application's load and placed at different availability zones to enhance the reliability of the application. The objective of the proposed solution is to minimize multi-tenant SaaS application reliability risk considering reliability score of datacenter Server energy, Server and Network capacity. Our evaluation and discussions shows that the proposed reliable framework enhances the reliability of the application and reduces the application risk factor.

Liang Luo - One of the best experts on this subject based on the ideXlab platform.

  • parameter hub a rack scale parameter Server for distributed deep neural Network training
    Symposium on Cloud Computing, 2018
    Co-Authors: Liang Luo, Jacob Nelson, Luis Ceze, Amar Phanishayee, Arvind Krishnamurthy
    Abstract:

    Distributed deep neural Network (DDNN) training constitutes an increasingly important workload that frequently runs in the cloud. Larger DNN models and faster compute engines are shifting DDNN training bottlenecks from computation to communication. This paper characterizes DDNN training to precisely pinpoint these bottlenecks. We found that timely training requires high performance parameter Servers (PSs) with optimized Network stacks and gradient processing pipelines, as well as Server and Network hardware with balanced computation and communication resources. We therefore propose PHub, a high performance multi-tenant, rack-scale PS design. PHub co-designs the PS software and hardware to accelerate rack-level and hierarchical cross-rack parameter exchange, with an API compatible with many DDNN training frameworks. PHub provides a performance improvement of up to 2.7x compared to state-of-the-art cloud-based distributed training techniques for image classification workloads, with 25% better throughput per dollar.

  • parameter hub a rack scale parameter Server for distributed deep neural Network training
    arXiv: Distributed Parallel and Cluster Computing, 2018
    Co-Authors: Liang Luo, Jacob Nelson, Luis Ceze, Amar Phanishayee, Arvind Krishnamurthy
    Abstract:

    Distributed deep neural Network (DDNN) training constitutes an increasingly important workload that frequently runs in the cloud. Larger DNN models and faster compute engines are shifting DDNN training bottlenecks from computation to communication. This paper characterizes DDNN training to precisely pinpoint these bottlenecks. We found that timely training requires high performance parameter Servers (PSs) with optimized Network stacks and gradient processing pipelines, as well as Server and Network hardware with balanced computation and communication resources. We therefore propose PHub, a high performance multi-tenant, rack-scale PS design. PHub co-designs the PS software and hardware to accelerate rack-level and hierarchical cross-rack parameter exchange, with an API compatible with many DDNN training frameworks. PHub provides a performance improvement of up to 2.7x compared to state-of-the-art distributed training techniques for cloud-based ImageNet workloads, with 25% better throughput per dollar.

Raouf Boutaba - One of the best experts on this subject based on the ideXlab platform.

  • vdc planner dynamic migration aware virtual data center embedding for clouds
    Integrated Network Management, 2013
    Co-Authors: Mohamed Faten Zhani, Qi Zhang, Gwendal Simon, Raouf Boutaba
    Abstract:

    Cloud computing promises to provide computing resources to a large number of service applications in an on demand manner. Traditionally, cloud providers such as Amazon only provide guaranteed allocation for compute and storage resources, and fail to support bandwidth requirements and performance isolation among these applications. To address this limitation, recently, a number of proposals advocate providing both guaranteed Server and Network resources in the form of Virtual Data Centers (VDCs). This raises the problem of optimally allocating both Servers and data center Networks to multiple VDCs in order to maximize the total revenue, while minimizing the total energy consumption in the data center. However, despite recent studies on this problem, none of the existing solutions have considered the possibility of using VM migration to dynamically adjust the resource allocation, in order to meet the fluctuating resource demand of VDCs. In this paper, we propose VDC Planner, a migration-aware dynamic virtual data center embedding framework that aims at achieving high revenue while minimizing the total energy cost over-time. Our framework supports various usage scenarios, including VDC embedding, VDC scaling as well as dynamic VDC consolidation. Through experiments using realistic workload traces, we show our proposed approach achieves both higher revenue and lower average scheduling delay compared to existing migration-oblivious solutions.

  • VDC Planner: Dynamic Migration-Aware Virtual Data Center Embedding for Clouds
    2013
    Co-Authors: Mohamed Faten Zhani, Qi Zhang, Gwendal Simon, Raouf Boutaba
    Abstract:

    Cloud computing promises to provide computing resources to a large number of service applications in an on-demand manner. Traditionally, cloud providers such as Amazon only provide guaranteed allocation for compute and storage resources, and fails to support the bandwidth requirements and performance isolation among these applications. To address this limitation, recently a number of proposals advocate providing both guaranteed Server and Network resources in the form of Virtual Data Centers (VDCs). This raises the problem of optimally allocating both Servers resources and data center Networks to multiple VDCs in order to optimize total revenue, while minimizing the total energy consumption in the data center. However, despite recent studies on this problem, none of the existing solutions have considered the possibility of using VM migration to dynamically adjust the resource allocation, in order to meet the fluctuating resource demand of VDCs. In this paper, we propose VDC Planner, a migration-aware dynamic virtual data center embedding framework that aims at achieving high revenue while minimizing the total energy cost over-time. Our framework supports various usage scenarios, including VDC embedding, VDC scaling as well as dynamic VDC consolidation. Through experiments using realistic workload traces, we show our proposed approach achieves both higher revenue and lower average scheduling delay compared to existing solutions in the literature.

Ouns Bouachir - One of the best experts on this subject based on the ideXlab platform.

  • ISCC - Energy Efficiency in SDDC: Considering Server and Network Utilities
    2020 IEEE Symposium on Computers and Communications (ISCC), 2020
    Co-Authors: Beakal Gizachew Assefa, Oznur Ozkasap, Ipek Kizil, Moayad Aloqaily, Ouns Bouachir
    Abstract:

    Software Defined Networking (SDN) has eased the management and control of Networks through separation of the control and data planes. Software defined data centers (SDDC) automate the management of end systems which are physical machines and virtual machines. In data centers, although there is a vast work on minimizing power consumption of physical machines and virtual machine migration performance, energy efficiency of the Network components is given little attention. In this paper, a software-based energy efficiency framework that jointly minimizes the power consumption of end systems and Network components in SDDC is proposed. Moreover, a novel physical Server utility interval based metric, namely Ratio for Energy Saving of Physical Machines (RESPM) which measures how energy efficient the physical Servers with respect to virtual machines residing within is proposed. To jointly maximize Network energy efficiency and RESPM values, an Integer Programming (IP) formulation has been introduced. Experiments conducted on real-world virtual migration traces show that the proposed framework jointly reduces the power consumption of end systems and Network components. The system has shown an improvement of 9% in RESPM, 35% energy saving in Ratio of Energy Saving in SDN (RESDN), and more than 50% in links saving.