Autonomic Manager

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 873 Experts worldwide ranked by ideXlab platform

Ozalp Babaoglu - One of the best experts on this subject based on the ideXlab platform.

  • Towards operator-less data centers through data-driven, predictive, proactive Autonomics
    Cluster Computing, 2016
    Co-Authors: Alina Sîrbu, Ozalp Babaoglu
    Abstract:

    Continued reliance on human operators for managing data centers is a major impediment for them from ever reaching extreme dimensions. Large computer systems in general, and data centers in particular, will ultimately be managed using predictive computational and executable models obtained through data-science tools, and at that point, the intervention of humans will be limited to setting high-level goals and policies rather than performing low-level operations. Data-driven Autonomics , where management and control are based on holistic predictive models that are built and updated using live data, opens one possible path towards limiting the role of operators in data centers. In this paper, we present a data-science study of a public Google dataset collected in a 12K-node cluster with the goal of building and evaluating predictive models for node failures. Our results support the practicality of a data-driven approach by showing the effectiveness of predictive models based on data found in typical data center logs. We use BigQuery, the big data SQL platform from the Google Cloud suite, to process massive amounts of data and generate a rich feature set characterizing node state over time. We describe how an ensemble classifier can be built out of many Random Forest classifiers each trained on these features, to predict if nodes will fail in a future 24-h window. Our evaluation reveals that if we limit false positive rates to 5 %, we can achieve true positive rates between 27 and 88 % with precision varying between 50 and 72 %. This level of performance allows us to recover large fraction of jobs’ executions (by redirecting them to other nodes when a failure of the present node is predicted) that would otherwise have been wasted due to failures. We discuss the feasibility of including our predictive model as the central component of a data-driven Autonomic Manager and operating it on-line with live data streams (rather than off-line on data logs). All of the scripts used for BigQuery and classification analyses are publicly available on GitHub.

  • O.: Towards data-driven Autonomics in data centers
    2016
    Co-Authors: Ozalp Babaoglu
    Abstract:

    Abstract—Continued reliance on human operators for man-aging data centers is a major impediment for them from ever reaching extreme dimensions. Large computer systems in general, and data centers in particular, will ultimately be managed using predictive computational and executable models obtained through data-science tools, and at that point, the intervention of humans will be limited to setting high-level goals and policies rather than performing low-level operations. Data-driven Autonomics, where management and control are based on holistic predictive models that are built and updated using generated data, opens one possible path towards limiting the role of operators in data centers. In this paper, we present a data-science study of a public Google dataset collected in a 12K-node cluster with the goal of building and evaluating a predictive model for node failures. We use BigQuery, the big data SQL platform from the Google Cloud suite, to process massive amounts of data and generate a rich feature set characterizing machine state over time. We describe how an ensemble classifier can be built out of many Random Forest classifiers each trained on these features, to predict if machines will fail in a future 24-hour window. Our evaluation reveals that if we limit false positive rates to 5%, we can achieve true positive rates between 27 % and 88 % with precision varying between 50 % and 72%. We discuss the practicality of including our predictive model as the central component of a data-driven Autonomic Manager and operating it on-line with live data streams (rather than off-line on data logs). All of the scripts used for BigQuery and classification analyses are publicly available from the authors ’ website. Keywords-Data science; predictive analytics; Google cluster trace; log data analysis; failure prediction; machine learning classification; ensemble classifier; random forest; BigQuery I

  • Towards Data-Driven Autonomics in Data Centers
    Cloud and Autonomic Computing (ICCAC), 2015 International Conference on, 2015
    Co-Authors: Alina Sîrbu, Ozalp Babaoglu
    Abstract:

    Continued reliance on human operators for managing data centers is a major impediment for them from ever reaching extreme dimensions. Large computer systems in general, and data centers in particular, will ultimately be managed using predictive computational and executable models obtained through data-science tools, and at that point, the intervention of humans will be limited to setting high-level goals and policies rather than performing low-level operations. Data-driven Autonomics, where management and control are based on holistic predictive models that are built and updated using generated data, opens one possible path towards limiting the role of operators in data centers. In this paper, we present a data-science study of a public Google dataset collected in a 12K-node cluster with the goal of building and evaluating a predictive model for node failures. We use BigQuery, the big data SQL platform from the Google Cloud suite, to process massive amounts of data and generate a rich feature set characterizing machine state over time. We describe how an ensemble classifier can be built out of many Random Forest classifiers each trained on these features, to predict if machines will fail in a future 24-hour window. Our evaluation reveals that if we limit false positive rates to 5%, we can achieve true positive rates between 27% and 88% with precision varying between 50% and 72%. We discuss the practicality of including our predictive model as the central component of a data-driven Autonomic Manager and operating it on-line with live data streams (rather than off-line on data logs). All of the scripts used for BigQuery and classification analyses are publicly available from the authors' website.

Alina Sîrbu - One of the best experts on this subject based on the ideXlab platform.

  • Towards operator-less data centers through data-driven, predictive, proactive Autonomics
    Cluster Computing, 2016
    Co-Authors: Alina Sîrbu, Ozalp Babaoglu
    Abstract:

    Continued reliance on human operators for managing data centers is a major impediment for them from ever reaching extreme dimensions. Large computer systems in general, and data centers in particular, will ultimately be managed using predictive computational and executable models obtained through data-science tools, and at that point, the intervention of humans will be limited to setting high-level goals and policies rather than performing low-level operations. Data-driven Autonomics , where management and control are based on holistic predictive models that are built and updated using live data, opens one possible path towards limiting the role of operators in data centers. In this paper, we present a data-science study of a public Google dataset collected in a 12K-node cluster with the goal of building and evaluating predictive models for node failures. Our results support the practicality of a data-driven approach by showing the effectiveness of predictive models based on data found in typical data center logs. We use BigQuery, the big data SQL platform from the Google Cloud suite, to process massive amounts of data and generate a rich feature set characterizing node state over time. We describe how an ensemble classifier can be built out of many Random Forest classifiers each trained on these features, to predict if nodes will fail in a future 24-h window. Our evaluation reveals that if we limit false positive rates to 5 %, we can achieve true positive rates between 27 and 88 % with precision varying between 50 and 72 %. This level of performance allows us to recover large fraction of jobs’ executions (by redirecting them to other nodes when a failure of the present node is predicted) that would otherwise have been wasted due to failures. We discuss the feasibility of including our predictive model as the central component of a data-driven Autonomic Manager and operating it on-line with live data streams (rather than off-line on data logs). All of the scripts used for BigQuery and classification analyses are publicly available on GitHub.

  • Towards Data-Driven Autonomics in Data Centers
    Cloud and Autonomic Computing (ICCAC), 2015 International Conference on, 2015
    Co-Authors: Alina Sîrbu, Ozalp Babaoglu
    Abstract:

    Continued reliance on human operators for managing data centers is a major impediment for them from ever reaching extreme dimensions. Large computer systems in general, and data centers in particular, will ultimately be managed using predictive computational and executable models obtained through data-science tools, and at that point, the intervention of humans will be limited to setting high-level goals and policies rather than performing low-level operations. Data-driven Autonomics, where management and control are based on holistic predictive models that are built and updated using generated data, opens one possible path towards limiting the role of operators in data centers. In this paper, we present a data-science study of a public Google dataset collected in a 12K-node cluster with the goal of building and evaluating a predictive model for node failures. We use BigQuery, the big data SQL platform from the Google Cloud suite, to process massive amounts of data and generate a rich feature set characterizing machine state over time. We describe how an ensemble classifier can be built out of many Random Forest classifiers each trained on these features, to predict if machines will fail in a future 24-hour window. Our evaluation reveals that if we limit false positive rates to 5%, we can achieve true positive rates between 27% and 88% with precision varying between 50% and 72%. We discuss the practicality of including our predictive model as the central component of a data-driven Autonomic Manager and operating it on-line with live data streams (rather than off-line on data logs). All of the scripts used for BigQuery and classification analyses are publicly available from the authors' website.

Avouac Pierre-alain - One of the best experts on this subject based on the ideXlab platform.

  • Plateforme autonomique dirigée par les modèles pour la construction d'interfaces multimodales dans les environnements pervasifs
    HAL CCSD, 2013
    Co-Authors: Avouac Pierre-alain
    Abstract:

    N pervasive environments, with the proliferation of communicating devices in the environments (e.g., remoter controller, gamepad, mobile phone, augmented object), the users will express their needs or desires to an enormous variety of services with a multitude of available interaction modalities, expecting concurrently the environment and its equipment to react accordingly. Addressing the challenge of dynamic management at runtime of multimodal interaction in pervasive environments, our contribution is dedicated to software engineering of dynamic multimodal interfaces by providing: a specification language for multimodal interaction, an Autonomic Manager and an integration platform. The Autonomic Manager uses models to generate and maintain a multimodal interaction adapted to the current conditions of the environment. The multimodal interaction data-flow from input devices to a service is then effectively realized by the integration platform. Our conceptual solution is implemented by our DynaMo platform that is fully operational and stable. DynaMo is based on iPOJO, a dynamic service-oriented component framework built on top of OSGi and on Cilia, a component-based mediation framework.La construction d’interfaces homme-machine au dessus d’applications complexes soulève aujourd’hui des problèmes importants et requiert des efforts de recherche conséquents et soutenus. Il s’agit en effet d’aborder des technologies de plus en plus diverses et complexes de façon à construire des interfaces modulaires, évolutives et tirant profits des récents progrès dans les domaines de la programmation et des intergiciels. Il s’agit également de permettre à des non informaticiens, spécialistes de l’ergonomie, de définir et de mettre en place des interfaces appropriées. L’approche orientée service (Service-oriented Computing - SOC) constitue une avancée récente en Génie Logiciel. Cette approche promeut la mise en place de solutions modulaires et dynamiques permettant de faire évoluer, possiblement à l’exécution, les interfaces. L’approche orientée service est très prometteuse et de nombreux projets de recherche sont en cours dans les domaines de l’intégration d’entreprise, des équipements mobiles ou encore de l’informatique pervasive. L’approche orientée service demeure néanmoins complexe et demande un haut niveau d’expertise. Elle est difficilement accessible par des informaticiens non formés et totalement hors de portée des ingénieurs d’autres métiers, ergonomes par exemple. L’approche proposée dans cette thèse est de construire un atelier manipulant des services IHM abstraits. Ces services abstraits décrivent leurs fonctionnalités et leurs dépendances à un haut niveau d’abstraction. Ils peuvent ainsi être composés de façon plus aisée par des ingénieurs non experts en SOC. Le rôle de l’atelier est ensuite d’identifier des services concrets, implantant les services abstraits, de les composer en générant le code nécessaire (glue code) et de les déployer sur une plate-forme d’exécution. Un deuxième point concerne la spécialisation de l’atelier. Il est effet important de proposer un langage de composition de services proches des concepts métiers manipulés par les experts, notamment les ergonomes. Un tel langage se base sur les concepts métiers et intègre les contraintes de composition propres au domaine. L’approche actuelle passe par l’utilisation de méta-modèles, exprimant les connaissances métier, pour la spécialisation de l’atelier

  • Model-Driven Autonomic Framework for Building Multimodal Interfaces in Pervasive Environments
    2013
    Co-Authors: Avouac Pierre-alain
    Abstract:

    La construction d’interfaces homme-machine au dessus d’applications complexes soulève aujourd’hui des problèmes importants et requiert des efforts de recherche conséquents et soutenus. Il s’agit en effet d’aborder des technologies de plus en plus diverses et complexes de façon à construire des interfaces modulaires, évolutives et tirant profits des récents progrès dans les domaines de la programmation et des intergiciels. Il s’agit également de permettre à des non informaticiens, spécialistes de l’ergonomie, de définir et de mettre en place des interfaces appropriées. L’approche orientée service (Service-oriented Computing - SOC) constitue une avancée récente en Génie Logiciel. Cette approche promeut la mise en place de solutions modulaires et dynamiques permettant de faire évoluer, possiblement à l’exécution, les interfaces. L’approche orientée service est très prometteuse et de nombreux projets de recherche sont en cours dans les domaines de l’intégration d’entreprise, des équipements mobiles ou encore de l’informatique pervasive. L’approche orientée service demeure néanmoins complexe et demande un haut niveau d’expertise. Elle est difficilement accessible par des informaticiens non formés et totalement hors de portée des ingénieurs d’autres métiers, ergonomes par exemple. L’approche proposée dans cette thèse est de construire un atelier manipulant des services IHM abstraits. Ces services abstraits décrivent leurs fonctionnalités et leurs dépendances à un haut niveau d’abstraction. Ils peuvent ainsi être composés de façon plus aisée par des ingénieurs non experts en SOC. Le rôle de l’atelier est ensuite d’identifier des services concrets, implantant les services abstraits, de les composer en générant le code nécessaire (glue code) et de les déployer sur une plate-forme d’exécution. Un deuxième point concerne la spécialisation de l’atelier. Il est effet important de proposer un langage de composition de services proches des concepts métiers manipulés par les experts, notamment les ergonomes. Un tel langage se base sur les concepts métiers et intègre les contraintes de composition propres au domaine. L’approche actuelle passe par l’utilisation de méta-modèles, exprimant les connaissances métier, pour la spécialisation de l’atelier.N pervasive environments, with the proliferation of communicating devices in the environments (e.g., remoter controller, gamepad, mobile phone, augmented object), the users will express their needs or desires to an enormous variety of services with a multitude of available interaction modalities, expecting concurrently the environment and its equipment to react accordingly. Addressing the challenge of dynamic management at runtime of multimodal interaction in pervasive environments, our contribution is dedicated to software engineering of dynamic multimodal interfaces by providing: a specification language for multimodal interaction, an Autonomic Manager and an integration platform. The Autonomic Manager uses models to generate and maintain a multimodal interaction adapted to the current conditions of the environment. The multimodal interaction data-flow from input devices to a service is then effectively realized by the integration platform. Our conceptual solution is implemented by our DynaMo platform that is fully operational and stable. DynaMo is based on iPOJO, a dynamic service-oriented component framework built on top of OSGi and on Cilia, a component-based mediation framework

  • Autonomic Management of Multimodal Interaction: DynaMo in action
    'Association for Computing Machinery (ACM)', 2012
    Co-Authors: Avouac Pierre-alain, Lalanda Philippe, Nigay Laurence
    Abstract:

    Paper presentation (session 2): UbicompInternational audienceMultimodal interaction can play a dual key role in pervasive environments because it provides naturalness for interacting with distributed, dynamic and heterogeneous digitally controlled equipment and flexibility for letting the users select the interaction modalities depending on the context. The DynaMo (Dynamic multiModality) framework is dedicated to the development and the runtime management of multimodal interaction in pervasive environments. This paper focuses on the Autonomic approach of DynaMo whose originality is based on partial interaction models. The Autonomic Manager combines and completes partial available models at runtime in order to build multimodal interaction adapted to the current execution conditions and in conformance with the predicted models. We illustrate the Autonomic solution by considering several running examples and different partial interaction models

  • Towards Autonomic Multimodal Interaction
    'Association for Computing Machinery (ACM)', 2011
    Co-Authors: Avouac Pierre-alain, Nigay Laurence, Lalanda Philippe
    Abstract:

    Invited paperInternational audienceHeterogeneity and dynamism of pervasive environment prevent to build static multimodal interaction. In this paper, we present how we use the Autonomic approach to build and maintain adaptable multi-modal interaction. We describes characteristic of adaptation, realized by an Autonomic Manager that relies on models specified by interaction designers and developers. Finally, an example with a real application and existing devices is explained

  • Service-Oriented Autonomic Multimodal Interaction in a Pervasive Environment
    'Association for Computing Machinery (ACM)', 2011
    Co-Authors: Avouac Pierre-alain, Lalanda Philippe, Nigay Laurence
    Abstract:

    Oral Session 4: Ubiquitous InteractionInternational audienceHeterogeneity and dynamicity of pervasive environments require the construction of flexible multimodal interfaces at run time. In this paper, we present how we use an Autonomic approach to build and maintain adaptable input multimodal interfaces in smart building environments. We have developed an Autonomic solution relying on partial interaction models specified by interaction designers and developers. The role of the Autonomic Manager is to build complete interaction techniques based on runtime conditions and in conformity with the predicted models. The sole purpose here is to combine and complete partial models in order to obtain an appropriate multimodal interface. We illustrate our Autonomic solution by considering a running example based on an existing application and several input devices

M. Zubair - One of the best experts on this subject based on the ideXlab platform.

  • scheduling capable Autonomic Manager for policy based it change management system
    Enterprise Information Systems, 2010
    Co-Authors: Hady S Abdelsalam, K. Maly, R. Mukkamala, M. Zubair, David L Kaminsky
    Abstract:

    Managing large IT environments is expensive and labour intensive. Maintaining and upgrading with minimal disruption and administrative support has always been a challenging task for system administrators. One challenge faced by IT administrators is arriving at schedules for applying one or more change requests to one of the system components. Most of the time, the impact analysis of the proposed changes is done by humans and is often laborious and error-prone. Although this methodology might be suitable to handle changes that are planned way ahead in time, it is completely inappropriate for changes that need to be done sooner. In addition, such manual handling does not scale well with the size of the IT infrastructure. In this article, the focus is on the problem of scheduling change requests in the presence of organisational policies governing the use of its resources. The authors propose two approaches for change management scheduling and present the implementation details of two prototypes that prove the feasibility of the proposed approaches. Their implementation is integrated with an Autonomic Manager which they had described in their earlier work.

  • Scheduling-Capable Autonomic Manager for Policy Based IT Change Management System
    2008 12th International IEEE Enterprise Distributed Object Computing Conference, 2008
    Co-Authors: Abdel H. Salam, K. Maly, R. Mukkamala, M. Zubair, David Kaminsky
    Abstract:

    Managing large IT environments consisting of thousands of computers, routers, switches, and printers from different vendors is expensive and labor intensive. A typical piece of equipment goes through several planned and unplanned software and hardware upgrades. Maintaining and upgrading with minimal disruption and administrative support is a challenging task. One problem faced by IT administrators, in this change management process, is to arrive at a schedule for applying one or more change requests to the IT infrastructure. In this paper, the focus is on the problem of scheduling the change requests in the presence of organizational policies governing the use, access and availability of the IT infrastructure, we provide two approaches to scheduling and describe an implementation of a scheduler which shows the feasibility of our approach. The implementation is integrated with an Autonomic Manager described in earlier papers [1].

  • Infrastructure-Aware Autonomic Manager for Change Management
    Eighth IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY'07), 2007
    Co-Authors: Abdel H. Salam, K. Maly, R. Mukkamala, M. Zubair
    Abstract:

    Typical IT environments of medium to large size organizations consist of tens of networks that connect hundreds of servers to support the running of a large variety of business-relevant applications; usually from different vendors. Change management is an important management processes that, if automated, can have a direct impact on increasing service availability in IT environments. Although such automation is considered important, the requirements of the appropriate policy engine, and policy language to express both high level and low level policies are far from clear. In this paper, we report our experiences in addressing these problems. In particular, we concentrate on availability policies - policies through which IT Managers express the required availability of systems - and the Autonomic Manager that enforces them.

Abdel H. Salam - One of the best experts on this subject based on the ideXlab platform.

  • Scheduling-Capable Autonomic Manager for Policy Based IT Change Management System
    2008 12th International IEEE Enterprise Distributed Object Computing Conference, 2008
    Co-Authors: Abdel H. Salam, K. Maly, R. Mukkamala, M. Zubair, David Kaminsky
    Abstract:

    Managing large IT environments consisting of thousands of computers, routers, switches, and printers from different vendors is expensive and labor intensive. A typical piece of equipment goes through several planned and unplanned software and hardware upgrades. Maintaining and upgrading with minimal disruption and administrative support is a challenging task. One problem faced by IT administrators, in this change management process, is to arrive at a schedule for applying one or more change requests to the IT infrastructure. In this paper, the focus is on the problem of scheduling the change requests in the presence of organizational policies governing the use, access and availability of the IT infrastructure, we provide two approaches to scheduling and describe an implementation of a scheduler which shows the feasibility of our approach. The implementation is integrated with an Autonomic Manager described in earlier papers [1].

  • Infrastructure-Aware Autonomic Manager for Change Management
    Eighth IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY'07), 2007
    Co-Authors: Abdel H. Salam, K. Maly, R. Mukkamala, M. Zubair
    Abstract:

    Typical IT environments of medium to large size organizations consist of tens of networks that connect hundreds of servers to support the running of a large variety of business-relevant applications; usually from different vendors. Change management is an important management processes that, if automated, can have a direct impact on increasing service availability in IT environments. Although such automation is considered important, the requirements of the appropriate policy engine, and policy language to express both high level and low level policies are far from clear. In this paper, we report our experiences in addressing these problems. In particular, we concentrate on availability policies - policies through which IT Managers express the required availability of systems - and the Autonomic Manager that enforces them.