System State Data

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 96 Experts worldwide ranked by ideXlab platform

William Kramer - One of the best experts on this subject based on the ideXlab platform.

  • Taming of the Shrew: Modeling the Normal and Faulty Behaviour of Large-scale HPC Systems
    2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012
    Co-Authors: Ana Gainaru, Franck Cappello, William Kramer
    Abstract:

    HPC Systems are complex machines that generate a huge volume of System State Data called "events". Events are generated without following a general consistent rule and different hardware and software components of such Systems have different failure rates. Distinguishing between normal System behaviour and faulty situation relies on event analysis. Being able to detect quickly deviations from normality is essential for System administration and is the foundation of fault prediction. As HPC Systems continue to grow in size and complexity, mining event flows become more challenging and with the upcoming 10 Pet flop Systems, there is a lot of interest in this topic. Current event mining approaches do not take into consideration the specific behaviour of each type of events and as a consequence, fail to analyze them according to their characteristics. In this paper we propose a novel way of characterizing the normal and faulty behaviour of the System by using signal analysis concepts. All analysis modules create ELSA (Event Log Signal Analyzer), a toolkit that has the purpose of modelling the normal flow of each State event during a HPC System lifetime, and how it is affected when a failure hits the System. We show that these extracted models provide an accurate view of the System output, which improves the effectiveness of proactive fault tolerance algorithms. Specifically, we implemented a filtering algorithm and short-term fault prediction methodology based on the extracted model and test it against real failure traces from a large-scale System. We show that by analyzing each event according to its specific behaviour, we get a more realistic overview of the entire System.

  • IPDPS - Taming of the Shrew: Modeling the Normal and Faulty Behaviour of Large-scale HPC Systems
    2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012
    Co-Authors: Ana Gainaru, Franck Cappello, William Kramer
    Abstract:

    HPC Systems are complex machines that generate a huge volume of System State Data called âeventsâ. Events are generated without following a general consistent rule and different hardware and software components of such Systems have different failure rates. Distinguishing between normal System behaviour and faulty situation relies on event analysis. Being able to detect quickly deviations from normality is essential for System administration and is the foundation of fault prediction. As HPC Systems continue to grow in size and complexity, mining event flows become more challenging and with the upcoming 10 Pet flop Systems, there is a lot of interestin this topic. Current event mining approaches do not take into consideration the specific behaviour of each type of events and as a consequence, fail to analyze them according to their characteristics. In this paper we propose a novel way of characterizing the normal and faulty behaviour of the System by using signal analysis concepts. All analysis modules create ELSA (Event Log Signal Analyzer), a toolkit that has the purpose of modelling the normal flow of each State event during a HPC System lifetime, and how it is affected when a failure hits the System. We show that these extracted models provide an accurate view of the System output, which improves the effectiveness of proactive fault tolerance algorithms. Specifically, we implemented a filtering algorithm and short-term fault prediction methodology based on the extracted model and test it against real failure traces from a large-scale System. We show that by analyzing each event according to its specific behaviour, we get a more realistic overview of the entire System.

Ana Gainaru - One of the best experts on this subject based on the ideXlab platform.

  • Taming of the Shrew: Modeling the Normal and Faulty Behaviour of Large-scale HPC Systems
    2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012
    Co-Authors: Ana Gainaru, Franck Cappello, William Kramer
    Abstract:

    HPC Systems are complex machines that generate a huge volume of System State Data called "events". Events are generated without following a general consistent rule and different hardware and software components of such Systems have different failure rates. Distinguishing between normal System behaviour and faulty situation relies on event analysis. Being able to detect quickly deviations from normality is essential for System administration and is the foundation of fault prediction. As HPC Systems continue to grow in size and complexity, mining event flows become more challenging and with the upcoming 10 Pet flop Systems, there is a lot of interest in this topic. Current event mining approaches do not take into consideration the specific behaviour of each type of events and as a consequence, fail to analyze them according to their characteristics. In this paper we propose a novel way of characterizing the normal and faulty behaviour of the System by using signal analysis concepts. All analysis modules create ELSA (Event Log Signal Analyzer), a toolkit that has the purpose of modelling the normal flow of each State event during a HPC System lifetime, and how it is affected when a failure hits the System. We show that these extracted models provide an accurate view of the System output, which improves the effectiveness of proactive fault tolerance algorithms. Specifically, we implemented a filtering algorithm and short-term fault prediction methodology based on the extracted model and test it against real failure traces from a large-scale System. We show that by analyzing each event according to its specific behaviour, we get a more realistic overview of the entire System.

  • IPDPS - Taming of the Shrew: Modeling the Normal and Faulty Behaviour of Large-scale HPC Systems
    2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012
    Co-Authors: Ana Gainaru, Franck Cappello, William Kramer
    Abstract:

    HPC Systems are complex machines that generate a huge volume of System State Data called âeventsâ. Events are generated without following a general consistent rule and different hardware and software components of such Systems have different failure rates. Distinguishing between normal System behaviour and faulty situation relies on event analysis. Being able to detect quickly deviations from normality is essential for System administration and is the foundation of fault prediction. As HPC Systems continue to grow in size and complexity, mining event flows become more challenging and with the upcoming 10 Pet flop Systems, there is a lot of interestin this topic. Current event mining approaches do not take into consideration the specific behaviour of each type of events and as a consequence, fail to analyze them according to their characteristics. In this paper we propose a novel way of characterizing the normal and faulty behaviour of the System by using signal analysis concepts. All analysis modules create ELSA (Event Log Signal Analyzer), a toolkit that has the purpose of modelling the normal flow of each State event during a HPC System lifetime, and how it is affected when a failure hits the System. We show that these extracted models provide an accurate view of the System output, which improves the effectiveness of proactive fault tolerance algorithms. Specifically, we implemented a filtering algorithm and short-term fault prediction methodology based on the extracted model and test it against real failure traces from a large-scale System. We show that by analyzing each event according to its specific behaviour, we get a more realistic overview of the entire System.

Enzo Morosini Frazzon - One of the best experts on this subject based on the ideXlab platform.

  • REVIEW OF SIMULATION-BASED OPTIMIZATION APPROACHES FOR THE ADAPTIVE SCHEDULING AND CONTROL OF DYNAMIC PRODUCTION SystemS
    DEStech Transactions on Engineering and Technology Research, 2018
    Co-Authors: Ricardo Pimentel, Enzo Morosini Frazzon, Pedro Pfeifer Portela Santos
    Abstract:

    The performance of complex manufacturing Systems in fast-moving and competitive environments is directly influenced by the scheduling and control of production, which need to cope with Systems complexity and stochastic behaviour as well as to handle internal and external dynamic influences. In this context, cyberphysical Systems and industry 4.0 concepts and technologies, which allow the real-time availability of System State Data and the interconnectedness of information and material flows, can support the enhancement of scheduling and control processes. In this direction, the adaptive scheduling and control of dynamic production Systems by means of hybrid simulation and optimisation approaches, which can take advantage of System State Data visibility, possess great scientific and industrial relevance. This work aims to report the State of the art in simulation-based optimisation of dynamic production Systems supported by cyber-physical Systems. The research Systematically analyses the published literature and outlines main trends as well as the current gaps, discussing future prospects for the adaptive scheduling and control of production Systems. It was substantiated that Systems with greater complexity, dynamicity and randomness impel the application of simulation-based optimization approaches supported by cyber physical Systems and industry 4.0 concepts and technologies.

  • State OF THE ART IN SIMULATION-BASED OPTIMIZATION APPROACHES FOR VEHICLE ROUTING PROBLEMS ALONG MANUFACTURING SUPPLY CHAINS
    DEStech Transactions on Engineering and Technology Research, 2018
    Co-Authors: D.e. Mazzuco, Djonathan Luiz De Oliveira, Enzo Morosini Frazzon
    Abstract:

    Transport execution influences directly the operational performance of distributed production Systems. In order to attain efficiency, many decisions have to be taken, such as, (i) the choice of best route which minimizes travelling distance, (ii) the definition of proper scheduling, which improves the displacement timing, and (iii) the selection of transportation mode. Transport routing and scheduling present stochastic characteristics which can be described as functions of probability. In the recent literature, it has been suggested that complex stochastic problems can be solved using simulation-based optimization approaches (SBO). SBO combines the power of optimization heuristics with the advantages of simulation models which can evaluate the effect of parameter changes even on very complex Systems. Since it is capable of capturing the relationships and interactions among various entities and subsequently identifying a good design or solution, SBO might represent a powerful support to decision-making in complex and stochastic situations such as transport routing and scheduling in distributed production Systems. Furthermore, transport Systems are highly dynamic, which have to adapt permanently to a variety of oscillations such as unpredictable demand, urgent requests of high priority, or disturbances such as vehicle crashes. On the technological frontier, the availability of System State Data is facilitated by the introduction of cyber-physical Systems and industry 4.0 concepts and technologies. Thereof, a new approach that employs the newly available Data, enabling real-time revision of transport routing and scheduling as operations take place, embodies a research opportunity with potential practical impact. This paper aims to report the State of the art regarding SBO approaches applied to vehicle routing and scheduling problems with pick-up and delivery (VRPPD). The paper substantiate the relevance of developing new approaches for vehicle routing and scheduling along distributed manufacturing supply chains, which embrace the new possibilities created by the widespread employment of cyber-physical Systems.

  • Winter Simulation Conference - Potential of Data-driven simulation-based optimization for adaptive scheduling and control of dynamic manufacturing Systems
    2016 Winter Simulation Conference (WSC), 2016
    Co-Authors: Mirko Kück, Torsten Hildebrandt, Michael Freitag, Enzo Morosini Frazzon
    Abstract:

    The increasing customization of products, which leads to greater variances and smaller lot sizes, requires highly flexible manufacturing Systems. These Systems are subject to dynamic influences and demand increasing effort for the generation of feasible production schedules and process control. This paper presents an approach for dealing with these challenges. First, production scheduling is executed by coupling an optimization heuristic with a simulation model. Second, real-time System State Data, to be provided by forthcoming cyber-physical Systems, is fed back, so that the simulation model is continuously updated and the optimization heuristic can either adjust an existing schedule or generate a new one. The potential of the approach was tested by means of a use case embracing a semiconductor manufacturing facility, in which the simulation results were employed to support the selection of better dispatching rules, improving flexible manufacturing Systems performance regarding the average production cycle time.

  • Potential of Data-driven simulation-based optimization for adaptive scheduling and control of dynamic manufacturing Systems
    2016 Winter Simulation Conference (WSC), 2016
    Co-Authors: Mirko Kück, Torsten Hildebrandt, Michael Freitag, Enzo Morosini Frazzon
    Abstract:

    The increasing customization of products, which leads to greater variances and smaller lot sizes, requires highly flexible manufacturing Systems. These Systems are subject to dynamic influences and demand increasing effort for the generation of feasible production schedules and process control. This paper presents an approach for dealing with these challenges. First, production scheduling is executed by coupling an optimization heuristic with a simulation model. Second, real-time System State Data, to be provided by forthcoming cyber-physical Systems, is fed back, so that the simulation model is continuously updated and the optimization heuristic can either adjust an existing schedule or generate a new one. The potential of the approach was tested by means of a use case embracing a semiconductor manufacturing facility, in which the simulation results were employed to support the selection of better dispatching rules, improving flexible manufacturing Systems performance regarding the average production cycle time.

Yong Wang - One of the best experts on this subject based on the ideXlab platform.

  • GLOBECOM - A System for Detecting Malicious Insider Data Theft in IaaS Cloud Environments
    2016 IEEE Global Communications Conference (GLOBECOM), 2016
    Co-Authors: Jason Nikolai, Yong Wang
    Abstract:

    The Cloud Security Alliance lists Data theft and insider attacks as critical threats to cloud security. Our work puts forth an approach using a train, monitor, detect pattern which leverages a Stateful rule based k-nearest neighbors anomaly detection technique and System State Data to detect inside attacker Data theft on Infrastructure as a Service (IaaS) nodes. We posit, instantiate, and demonstrate our approach using the Eucalyptus cloud computing infrastructure where we observe a 100 percent detection rate for abnormal login events and Data copies to outside Systems.

  • A System for Detecting Malicious Insider Data Theft in IaaS Cloud Environments
    2016 IEEE Global Communications Conference (GLOBECOM), 2016
    Co-Authors: Jason Nikolai, Yong Wang
    Abstract:

    The Cloud Security Alliance lists Data theft and insider attacks as critical threats to cloud security. Our work puts forth an approach using a train, monitor, detect pattern which leverages a Stateful rule based k-nearest neighbors anomaly detection technique and System State Data to detect inside attacker Data theft on Infrastructure as a Service (IaaS) nodes. We posit, instantiate, and demonstrate our approach using the Eucalyptus cloud computing infrastructure where we observe a 100 percent detection rate for abnormal login events and Data copies to outside Systems.

Franck Cappello - One of the best experts on this subject based on the ideXlab platform.

  • Taming of the Shrew: Modeling the Normal and Faulty Behaviour of Large-scale HPC Systems
    2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012
    Co-Authors: Ana Gainaru, Franck Cappello, William Kramer
    Abstract:

    HPC Systems are complex machines that generate a huge volume of System State Data called "events". Events are generated without following a general consistent rule and different hardware and software components of such Systems have different failure rates. Distinguishing between normal System behaviour and faulty situation relies on event analysis. Being able to detect quickly deviations from normality is essential for System administration and is the foundation of fault prediction. As HPC Systems continue to grow in size and complexity, mining event flows become more challenging and with the upcoming 10 Pet flop Systems, there is a lot of interest in this topic. Current event mining approaches do not take into consideration the specific behaviour of each type of events and as a consequence, fail to analyze them according to their characteristics. In this paper we propose a novel way of characterizing the normal and faulty behaviour of the System by using signal analysis concepts. All analysis modules create ELSA (Event Log Signal Analyzer), a toolkit that has the purpose of modelling the normal flow of each State event during a HPC System lifetime, and how it is affected when a failure hits the System. We show that these extracted models provide an accurate view of the System output, which improves the effectiveness of proactive fault tolerance algorithms. Specifically, we implemented a filtering algorithm and short-term fault prediction methodology based on the extracted model and test it against real failure traces from a large-scale System. We show that by analyzing each event according to its specific behaviour, we get a more realistic overview of the entire System.

  • IPDPS - Taming of the Shrew: Modeling the Normal and Faulty Behaviour of Large-scale HPC Systems
    2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012
    Co-Authors: Ana Gainaru, Franck Cappello, William Kramer
    Abstract:

    HPC Systems are complex machines that generate a huge volume of System State Data called âeventsâ. Events are generated without following a general consistent rule and different hardware and software components of such Systems have different failure rates. Distinguishing between normal System behaviour and faulty situation relies on event analysis. Being able to detect quickly deviations from normality is essential for System administration and is the foundation of fault prediction. As HPC Systems continue to grow in size and complexity, mining event flows become more challenging and with the upcoming 10 Pet flop Systems, there is a lot of interestin this topic. Current event mining approaches do not take into consideration the specific behaviour of each type of events and as a consequence, fail to analyze them according to their characteristics. In this paper we propose a novel way of characterizing the normal and faulty behaviour of the System by using signal analysis concepts. All analysis modules create ELSA (Event Log Signal Analyzer), a toolkit that has the purpose of modelling the normal flow of each State event during a HPC System lifetime, and how it is affected when a failure hits the System. We show that these extracted models provide an accurate view of the System output, which improves the effectiveness of proactive fault tolerance algorithms. Specifically, we implemented a filtering algorithm and short-term fault prediction methodology based on the extracted model and test it against real failure traces from a large-scale System. We show that by analyzing each event according to its specific behaviour, we get a more realistic overview of the entire System.