Personal Assistant

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

Lingjia Tang - One of the best experts on this subject based on the ideXlab platform.

  • baymax qos awareness and increased utilization for non preemptive accelerators in warehouse scale computers
    Architectural Support for Programming Languages and Operating Systems, 2016
    Co-Authors: Quan Chen, Jason Mars, Hailong Yang, Lingjia Tang
    Abstract:

    Modern warehouse-scale computers (WSCs) are being outfitted with accelerators to provide the significant compute required by emerging intelligent Personal Assistant (IPA) workloads such as voice recognition, image classification, and natural language processing. It is well known that the diurnal user access pattern of user-facing services provides a strong incentive to co-locate applications for better accelerator utilization and efficiency, and prior work has focused on enabling co-location on multicore processors. However, interference when co-locating applications on non-preemptive accelerators is fundamentally different than contention on multi-core CPUs and introduces a new set of challenges to reduce QoS violation. To address this open problem, we first identify the underlying causes for QoS violation in accelerator-outfitted servers. Our experiments show that queuing delay for the compute resources and PCI-e bandwidth contention for data transfer are the main two factors that contribute to the long tails of user-facing applications. We then present Baymax, a runtime system that orchestrates the execution of compute tasks from different applications and mitigates PCI-e bandwidth contention to deliver the required QoS for user-facing applications and increase the accelerator utilization. Using DjiNN, a deep neural network service, Sirius, an end-to-end IPA workload, and traditional applications on a Nvidia K40 GPU, our evaluation shows that Baymax improves the accelerator utilization by 91.3% while achieving the desired 99%-ile latency target for for user-facing applications. In fact, Baymax reduces the 99%-ile latency of user-facing applications by up to 195x over default execution.

  • sirius an open end to end voice and vision Personal Assistant and its implications for future warehouse scale computers
    Architectural Support for Programming Languages and Operating Systems, 2015
    Co-Authors: Johann Hauswald, Michael A Laurenzano, Yunqi Zhang, Austin Rovinski, Arjun Khurana, Ronald G Dreslinski, Trevor Mudge, Vinicius Petrucci, Lingjia Tang, Jason Mars
    Abstract:

    As user demand scales for intelligent Personal Assistants (IPAs) such as Apple's Siri, Google's Google Now, and Microsoft's Cortana, we are approaching the computational limits of current datacenter architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open-source IPA workload is an obstacle in addressing this question. In this paper, we present the design of Sirius, an open end-to-end IPA web-service application that accepts queries in the form of voice and images, and responds with natural language. We then use this workload to investigate the implications of four points in the design space of future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FPGAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of 7 benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to a spectrum of accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points. In our study, we find that accelerators are critical for the future scalability of IPA services. Our results show that GPU- and FPGA-accelerated servers improve the query latency on average by 10x and 16x. For a given throughput, GPU- and FPGA-accelerated servers can reduce the TCO of datacenters by 2.6x and 1.4x, respectively.

Jason Mars - One of the best experts on this subject based on the ideXlab platform.

  • baymax qos awareness and increased utilization for non preemptive accelerators in warehouse scale computers
    Architectural Support for Programming Languages and Operating Systems, 2016
    Co-Authors: Quan Chen, Jason Mars, Hailong Yang, Lingjia Tang
    Abstract:

    Modern warehouse-scale computers (WSCs) are being outfitted with accelerators to provide the significant compute required by emerging intelligent Personal Assistant (IPA) workloads such as voice recognition, image classification, and natural language processing. It is well known that the diurnal user access pattern of user-facing services provides a strong incentive to co-locate applications for better accelerator utilization and efficiency, and prior work has focused on enabling co-location on multicore processors. However, interference when co-locating applications on non-preemptive accelerators is fundamentally different than contention on multi-core CPUs and introduces a new set of challenges to reduce QoS violation. To address this open problem, we first identify the underlying causes for QoS violation in accelerator-outfitted servers. Our experiments show that queuing delay for the compute resources and PCI-e bandwidth contention for data transfer are the main two factors that contribute to the long tails of user-facing applications. We then present Baymax, a runtime system that orchestrates the execution of compute tasks from different applications and mitigates PCI-e bandwidth contention to deliver the required QoS for user-facing applications and increase the accelerator utilization. Using DjiNN, a deep neural network service, Sirius, an end-to-end IPA workload, and traditional applications on a Nvidia K40 GPU, our evaluation shows that Baymax improves the accelerator utilization by 91.3% while achieving the desired 99%-ile latency target for for user-facing applications. In fact, Baymax reduces the 99%-ile latency of user-facing applications by up to 195x over default execution.

  • sirius an open end to end voice and vision Personal Assistant and its implications for future warehouse scale computers
    Architectural Support for Programming Languages and Operating Systems, 2015
    Co-Authors: Johann Hauswald, Michael A Laurenzano, Yunqi Zhang, Austin Rovinski, Arjun Khurana, Ronald G Dreslinski, Trevor Mudge, Vinicius Petrucci, Lingjia Tang, Jason Mars
    Abstract:

    As user demand scales for intelligent Personal Assistants (IPAs) such as Apple's Siri, Google's Google Now, and Microsoft's Cortana, we are approaching the computational limits of current datacenter architectures. It is an open question how future server architectures should evolve to enable this emerging class of applications, and the lack of an open-source IPA workload is an obstacle in addressing this question. In this paper, we present the design of Sirius, an open end-to-end IPA web-service application that accepts queries in the form of voice and images, and responds with natural language. We then use this workload to investigate the implications of four points in the design space of future accelerator-based server architectures spanning traditional CPUs, GPUs, manycore throughput co-processors, and FPGAs. To investigate future server designs for Sirius, we decompose Sirius into a suite of 7 benchmarks (Sirius Suite) comprising the computationally intensive bottlenecks of Sirius. We port Sirius Suite to a spectrum of accelerator platforms and use the performance and power trade-offs across these platforms to perform a total cost of ownership (TCO) analysis of various server design points. In our study, we find that accelerators are critical for the future scalability of IPA services. Our results show that GPU- and FPGA-accelerated servers improve the query latency on average by 10x and 16x. For a given throughput, GPU- and FPGA-accelerated servers can reduce the TCO of datacenters by 2.6x and 1.4x, respectively.

Melinda Gervasio - One of the best experts on this subject based on the ideXlab platform.

  • An intelligent Personal Assistant for task and time management
    AI Magazine, 2007
    Co-Authors: Karen Myers, Pauline Berry, Kqn Conley, Jim Blythe, Melinda Gervasio
    Abstract:

    We describe an intelligent Personal Assistant that has been developed to aid a busy knowl- edge worker in managing time commitments and performing tasks. The design of the system was motivated by the complementary objec- tives of (1) relieving the user of routine tasks, thus allowing her to focus on tasks that critical- ly require human problem-solving skills, and (2) intervening in situations where cognitive over- load leads to oversights or mistakes by the user. The system draws on a diverse set of Al tech- nologies that are linked within a Belief-Desire- Intention (BDI) agent system. Although the sys- tem provides a number of automated functions, the overall framework is highly user centric in its support for human needs, responsiveness to human inputs, and adaptivity to user working style and preferences. typical

  • deploying a Personalized time management agent
    Adaptive Agents and Multi-Agents Systems, 2006
    Co-Authors: Pauline M Berry, Melinda Gervasio, Bart Peintner, Ken Conley, Tomas E Uribe, Neil Yorkesmith
    Abstract:

    We report on our ongoing practical experience in designing, implementing, and deploying PTIME, a Personalized agent for time management and meeting scheduling in an open, multi-agent environment. In developing PTIME as part of a larger assistive agent called CALO, we have faced numerous challenges, including usability, multi-agent coordination, scalable constraint reasoning, robust execution, and unobtrusive learning. Our research advances basic solutions to the fundamental problems; however, integrating PTIME into a deployed system has raised other important issues for the successful adoption of new technology. As a Personal Assistant, PTIME must integrate easily into a user's real environment, support her normal workflow, respect her authority and privacy, provide natural user interfaces, and handle the issues that arise with deploying such a system in an open environment.

Karen Myers - One of the best experts on this subject based on the ideXlab platform.

  • An intelligent Personal Assistant for task and time management
    AI Magazine, 2007
    Co-Authors: Karen Myers, Pauline Berry, Kqn Conley, Jim Blythe, Melinda Gervasio
    Abstract:

    We describe an intelligent Personal Assistant that has been developed to aid a busy knowl- edge worker in managing time commitments and performing tasks. The design of the system was motivated by the complementary objec- tives of (1) relieving the user of routine tasks, thus allowing her to focus on tasks that critical- ly require human problem-solving skills, and (2) intervening in situations where cognitive over- load leads to oversights or mistakes by the user. The system draws on a diverse set of Al tech- nologies that are linked within a Belief-Desire- Intention (BDI) agent system. Although the sys- tem provides a number of automated functions, the overall framework is highly user centric in its support for human needs, responsiveness to human inputs, and adaptivity to user working style and preferences. typical

Quan Chen - One of the best experts on this subject based on the ideXlab platform.

  • baymax qos awareness and increased utilization for non preemptive accelerators in warehouse scale computers
    Architectural Support for Programming Languages and Operating Systems, 2016
    Co-Authors: Quan Chen, Jason Mars, Hailong Yang, Lingjia Tang
    Abstract:

    Modern warehouse-scale computers (WSCs) are being outfitted with accelerators to provide the significant compute required by emerging intelligent Personal Assistant (IPA) workloads such as voice recognition, image classification, and natural language processing. It is well known that the diurnal user access pattern of user-facing services provides a strong incentive to co-locate applications for better accelerator utilization and efficiency, and prior work has focused on enabling co-location on multicore processors. However, interference when co-locating applications on non-preemptive accelerators is fundamentally different than contention on multi-core CPUs and introduces a new set of challenges to reduce QoS violation. To address this open problem, we first identify the underlying causes for QoS violation in accelerator-outfitted servers. Our experiments show that queuing delay for the compute resources and PCI-e bandwidth contention for data transfer are the main two factors that contribute to the long tails of user-facing applications. We then present Baymax, a runtime system that orchestrates the execution of compute tasks from different applications and mitigates PCI-e bandwidth contention to deliver the required QoS for user-facing applications and increase the accelerator utilization. Using DjiNN, a deep neural network service, Sirius, an end-to-end IPA workload, and traditional applications on a Nvidia K40 GPU, our evaluation shows that Baymax improves the accelerator utilization by 91.3% while achieving the desired 99%-ile latency target for for user-facing applications. In fact, Baymax reduces the 99%-ile latency of user-facing applications by up to 195x over default execution.