Symbolic Sequence

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 279 Experts worldwide ranked by ideXlab platform

Pierre-françois Marteau - One of the best experts on this subject based on the ideXlab platform.

  • Sequence Covering for Efficient Host-Based Intrusion Detection
    IEEE Transactions on Information Forensics and Security, 2019
    Co-Authors: Pierre-françois Marteau
    Abstract:

    This paper introduces a new similarity measure, the covering similarity, that we formally define for evaluating the similarity between a Symbolic Sequence and a set of Symbolic Sequences. A pair-wise similarity can also be directly derived from the covering similarity to compare two Symbolic Sequences. An efficient implementation to compute the covering similarity is proposed that uses a suffix tree data-structure, but other implementations, based on suffix array for instance, are possible and possibly necessary for handling large scale problems. We have used this similarity to isolate attack Sequences from normal Sequences in the scope of Host-based Intrusion Detection. We have assessed the covering similarity on two well-known benchmarks in the field. In view of the results reported on these two datasets for the state of the art methods, and according to the comparative study we have carried out based on three challenging similarity measures commonly used for string processing or in bioinformatics, we show that the covering similarity is particularly relevant to address the detection of anomalies in Sequences of system calls

  • Sequence Covering Similarity for Symbolic Sequence Comparison
    2018
    Co-Authors: Pierre-françois Marteau
    Abstract:

    This paper introduces the Sequence covering similarity, that we formally define for evaluating the similarity between a Symbolic Sequence (string) and a set of Symbolic Sequences (strings). From this covering similarity we derive a pair-wise distance to compare two Symbolic Sequences. We show that this covering distance is a semimetric. Few examples are given to show how this string metric in $O(n \cdot log n)$ compares with the Levenshtein's distance that is in $O(n^2)$. A final example presents its application to plagiarism detection.

  • Sequence Covering Similarity for Symbolic Sequence Comparison
    arXiv: Data Structures and Algorithms, 2018
    Co-Authors: Pierre-françois Marteau
    Abstract:

    This paper introduces the Sequence covering similarity, that we formally define for evaluating the similarity between a Symbolic Sequence (string) and a set of Symbolic Sequences (strings). From this covering similarity we derive a pair-wise distance to compare two Symbolic Sequences. We show that this covering distance is a metric. Few examples are given to show how this string metric in $O(n \cdot log n)$ compares with the Levenshtein's distance that is in $O(n^2)$. A final example presents its application to plagiarism detection.

Chiara Di Francescomarino - One of the best experts on this subject based on the ideXlab platform.

  • complex Symbolic Sequence clustering and multiple classifiers for predictive process monitoring
    Business Process Management, 2016
    Co-Authors: Fabrizio Maria Maggi, Marlon Dumas, Ilya Verenich, Marcello La Rosa, Chiara Di Francescomarino
    Abstract:

    This paper addresses the following predictive business process monitoring problem: Given the execution trace of an ongoing case, and given a set of traces of historical (completed) cases, predict the most likely outcome of the ongoing case. In this context, a trace refers to a Sequence of events with corresponding payloads, where a payload consists of a set of attribute-value pairs. Meanwhile, an outcome refers to a label associated to completed cases, like, for example, a label indicating that a given case completed “on time” (with respect to a given desired duration) or “late”, or a label indicating that a given case led to a customer complaint or not. The paper tackles this problem via a two-phased approach. In the first phase, prefixes of historical cases are encoded using complex Symbolic Sequences and clustered. In the second phase, a classifier is built for each of the clusters. To predict the outcome of an ongoing case at runtime given its (uncompleted) trace, we select the closest cluster(s) to the trace in question and apply the respective classifier(s), taking into account the Euclidean distance of the trace from the center of the clusters. We consider two families of clustering algorithms – hierarchical clustering and k-medoids – and use random forests for classification. The approach was evaluated on four real-life datasets.

  • Business Process Management Workshops - Complex Symbolic Sequence clustering and multiple classifiers for predictive process monitoring
    Business Process Management Workshops, 2016
    Co-Authors: Ilya Verenich, Fabrizio Maria Maggi, Marlon Dumas, Marcello La Rosa, Chiara Di Francescomarino
    Abstract:

    This paper addresses the following predictive business process monitoring problem: Given the execution trace of an ongoing case,and given a set of traces of historical (completed) cases, predict the most likely outcome of the ongoing case. In this context, a trace refers to a Sequence of events with corresponding payloads, where a payload consists of a set of attribute-value pairs. Meanwhile, an outcome refers to a label associated to completed cases, like, for example, a label indicating that a given case completed “on time” (with respect to a given desired duration) or “late”, or a label indicating that a given case led to a customer complaint or not. The paper tackles this problem via a two-phased approach. In the first phase, prefixes of historical cases are encoded using complex Symbolic Sequences and clustered. In the second phase, a classifier is built for each of the clusters. To predict the outcome of an ongoing case at runtime given its (uncompleted) trace, we select the closest cluster(s) to the trace in question and apply the respective classifier(s), taking into account the Euclidean distance of the trace from the center of the clusters. We consider two families of clustering algorithms – hierarchical clustering and k-medoids – and use random forests for classification. The approach was evaluated on four real-life datasets

  • complex Symbolic Sequence encodings for predictive monitoring of business processes
    Business Process Management, 2015
    Co-Authors: Anna Leontjeva, Raffaele Conforti, Chiara Di Francescomarino, Marlon Dumas, Fabrizio Maria Maggi
    Abstract:

    This paper addresses the problem of predicting the outcome of an ongoing case of a business process based on event logs. In this setting, the outcome of a case may refer for example to the achievement of a performance objective or the fulfillment of a compliance rule upon completion of the case. Given a log consisting of traces of completed cases, given a trace of an ongoing case, and given two or more possible outcomes e.g., a positive and a negative outcome, the paper addresses the problem of determining the most likely outcome for the case in question. Previous approaches to this problem are largely based on simple Symbolic Sequence classification, meaning that they extract features from traces seen as Sequences of event labels, and use these features to construct a classifier for runtime prediction. In doing so, these approaches ignore the data payload associated to each event. This paper approaches the problem from a different angle by treating traces as complex Symbolic Sequences, that is, Sequences of events each carrying a data payload. In this context, the paper outlines different feature encodings of complex Symbolic Sequences and compares their predictive accuracy on real-life business process event logs.

  • BPM - Complex Symbolic Sequence Encodings for Predictive Monitoring of Business Processes
    Lecture Notes in Computer Science, 2015
    Co-Authors: Anna Leontjeva, Raffaele Conforti, Chiara Di Francescomarino, Marlon Dumas, Fabrizio Maria Maggi
    Abstract:

    This paper addresses the problem of predicting the outcome of an ongoing case of a business process based on event logs. In this setting, the outcome of a case may refer for example to the achievement of a performance objective or the fulfillment of a compliance rule upon completion of the case. Given a log consisting of traces of completed cases, given a trace of an ongoing case, and given two or more possible out- comes (e.g., a positive and a negative outcome), the paper addresses the problem of determining the most likely outcome for the case in question. Previous approaches to this problem are largely based on simple Symbolic Sequence classification, meaning that they extract features from traces seen as Sequences of event labels, and use these features to construct a classifier for runtime prediction. In doing so, these approaches ignore the data payload associated to each event. This paper approaches the problem from a different angle by treating traces as complex Symbolic Sequences, that is, Sequences of events each carrying a data payload. In this context, the paper outlines different feature encodings of complex Symbolic Sequences and compares their predictive accuracy on real-life business process event logs

Fabrizio Maria Maggi - One of the best experts on this subject based on the ideXlab platform.

  • complex Symbolic Sequence clustering and multiple classifiers for predictive process monitoring
    Business Process Management, 2016
    Co-Authors: Fabrizio Maria Maggi, Marlon Dumas, Ilya Verenich, Marcello La Rosa, Chiara Di Francescomarino
    Abstract:

    This paper addresses the following predictive business process monitoring problem: Given the execution trace of an ongoing case, and given a set of traces of historical (completed) cases, predict the most likely outcome of the ongoing case. In this context, a trace refers to a Sequence of events with corresponding payloads, where a payload consists of a set of attribute-value pairs. Meanwhile, an outcome refers to a label associated to completed cases, like, for example, a label indicating that a given case completed “on time” (with respect to a given desired duration) or “late”, or a label indicating that a given case led to a customer complaint or not. The paper tackles this problem via a two-phased approach. In the first phase, prefixes of historical cases are encoded using complex Symbolic Sequences and clustered. In the second phase, a classifier is built for each of the clusters. To predict the outcome of an ongoing case at runtime given its (uncompleted) trace, we select the closest cluster(s) to the trace in question and apply the respective classifier(s), taking into account the Euclidean distance of the trace from the center of the clusters. We consider two families of clustering algorithms – hierarchical clustering and k-medoids – and use random forests for classification. The approach was evaluated on four real-life datasets.

  • Business Process Management Workshops - Complex Symbolic Sequence clustering and multiple classifiers for predictive process monitoring
    Business Process Management Workshops, 2016
    Co-Authors: Ilya Verenich, Fabrizio Maria Maggi, Marlon Dumas, Marcello La Rosa, Chiara Di Francescomarino
    Abstract:

    This paper addresses the following predictive business process monitoring problem: Given the execution trace of an ongoing case,and given a set of traces of historical (completed) cases, predict the most likely outcome of the ongoing case. In this context, a trace refers to a Sequence of events with corresponding payloads, where a payload consists of a set of attribute-value pairs. Meanwhile, an outcome refers to a label associated to completed cases, like, for example, a label indicating that a given case completed “on time” (with respect to a given desired duration) or “late”, or a label indicating that a given case led to a customer complaint or not. The paper tackles this problem via a two-phased approach. In the first phase, prefixes of historical cases are encoded using complex Symbolic Sequences and clustered. In the second phase, a classifier is built for each of the clusters. To predict the outcome of an ongoing case at runtime given its (uncompleted) trace, we select the closest cluster(s) to the trace in question and apply the respective classifier(s), taking into account the Euclidean distance of the trace from the center of the clusters. We consider two families of clustering algorithms – hierarchical clustering and k-medoids – and use random forests for classification. The approach was evaluated on four real-life datasets

  • complex Symbolic Sequence encodings for predictive monitoring of business processes
    Business Process Management, 2015
    Co-Authors: Anna Leontjeva, Raffaele Conforti, Chiara Di Francescomarino, Marlon Dumas, Fabrizio Maria Maggi
    Abstract:

    This paper addresses the problem of predicting the outcome of an ongoing case of a business process based on event logs. In this setting, the outcome of a case may refer for example to the achievement of a performance objective or the fulfillment of a compliance rule upon completion of the case. Given a log consisting of traces of completed cases, given a trace of an ongoing case, and given two or more possible outcomes e.g., a positive and a negative outcome, the paper addresses the problem of determining the most likely outcome for the case in question. Previous approaches to this problem are largely based on simple Symbolic Sequence classification, meaning that they extract features from traces seen as Sequences of event labels, and use these features to construct a classifier for runtime prediction. In doing so, these approaches ignore the data payload associated to each event. This paper approaches the problem from a different angle by treating traces as complex Symbolic Sequences, that is, Sequences of events each carrying a data payload. In this context, the paper outlines different feature encodings of complex Symbolic Sequences and compares their predictive accuracy on real-life business process event logs.

  • BPM - Complex Symbolic Sequence Encodings for Predictive Monitoring of Business Processes
    Lecture Notes in Computer Science, 2015
    Co-Authors: Anna Leontjeva, Raffaele Conforti, Chiara Di Francescomarino, Marlon Dumas, Fabrizio Maria Maggi
    Abstract:

    This paper addresses the problem of predicting the outcome of an ongoing case of a business process based on event logs. In this setting, the outcome of a case may refer for example to the achievement of a performance objective or the fulfillment of a compliance rule upon completion of the case. Given a log consisting of traces of completed cases, given a trace of an ongoing case, and given two or more possible out- comes (e.g., a positive and a negative outcome), the paper addresses the problem of determining the most likely outcome for the case in question. Previous approaches to this problem are largely based on simple Symbolic Sequence classification, meaning that they extract features from traces seen as Sequences of event labels, and use these features to construct a classifier for runtime prediction. In doing so, these approaches ignore the data payload associated to each event. This paper approaches the problem from a different angle by treating traces as complex Symbolic Sequences, that is, Sequences of events each carrying a data payload. In this context, the paper outlines different feature encodings of complex Symbolic Sequences and compares their predictive accuracy on real-life business process event logs

Juan Gabriel Brida - One of the best experts on this subject based on the ideXlab platform.

  • Multiple Regimes Model Reconstruction Using Symbolic Time Series Methods
    2004
    Co-Authors: Juan Gabriel Brida
    Abstract:

    In this paper we describe and apply the methods of Symbolic Time Series Analysis to an experimental framework. We discuss data symbolization as a tool for identifying temporal patterns in experimental data and use symbol Sequence statistics in a model strategy. In particular, we introduce a static partition in a time series of inflation rates. This partition is based on economic criteria using the notion of economic regime. Consequently, the time series is converted into a Symbolic Sequence. The probability of occurrence of different symbol strings constitute the symbol Sequence statistics. Then a method is discussed for reconstructing a model of inflation fluctuations from measured time series data, where the symbol Sequence statistics are used as the target for reconstruction. That is, we will show how the observed Symbolic Sequence statistics can be used as a target for measuring the goodness of fit of the proposed model.

  • Symbolic time series analysis and dynamic regimes
    Structural Change and Economic Dynamics, 2003
    Co-Authors: Juan Gabriel Brida, Lionello F. Punzo
    Abstract:

    In this paper I describe and apply the methods of Symbolic Time Series Analysis (STSA) to an experimental framework. The idea behind Symbolic Time Series Analysis is simple: the values of a given time series data are transformed into a finite set of symbols obtaining a finite string. Then, we can process the Symbolic Sequence using tools from information theory and Symbolic dynamics. I discuss data symbolization as a tool for identifying temporal patterns in experimental data and use symbol Sequence statistics in a model strategy. In this application the data symbolization is based on economic criteria using the notion of economic regime.

  • Symbolic Time Series Analysis in Economics
    2000
    Co-Authors: Juan Gabriel Brida
    Abstract:

    In this paper I describe and apply the methods of Symbolic Time Series Analysis (STSA) to an experimental framework. The idea behind Symbolic Time Series Analysis is simple: the values of a given time series data are transformed into a finite set of symbols obtaining a finite string. Then, we can process the Symbolic Sequence using tools from information theory and Symbolic dynamics. I discuss data symbolization as a tool for identifying temporal patterns in experimental data and use symbol Sequence statistics in a model strategy. To explain these applications, I describe methods to select the symbolization of the data (Section 2), I introduce the Symbolic Sequence histograms and some tools to characterize and compare these histograms (Section 3). I show that the methods of Symbolic time series analysis can be a good tool to describe and recognize time patterns in complex dynamical processes and to extract dynamical information about this kind of system. In particular, the method gives us a language in which to express and analyze these time patterns. In section 4 I report some applications of STSA to study the evolution of ifferent economies. In these applications data symbolization is based on economic criteria using the notion of economic regime introduced earlier in this thesis. I use STSA methods to describe the dynamical behavior of these economies and to do comparative analysis of their regime dynamics. In section 5 I use STSA to reconstruct a model of a dynamical system from measured time series data. In particular, I will show how the observed Symbolic Sequence statistics can be used as a target for measuring the goodness of fit of proposed models.

Ilya Verenich - One of the best experts on this subject based on the ideXlab platform.

  • complex Symbolic Sequence clustering and multiple classifiers for predictive process monitoring
    Business Process Management, 2016
    Co-Authors: Fabrizio Maria Maggi, Marlon Dumas, Ilya Verenich, Marcello La Rosa, Chiara Di Francescomarino
    Abstract:

    This paper addresses the following predictive business process monitoring problem: Given the execution trace of an ongoing case, and given a set of traces of historical (completed) cases, predict the most likely outcome of the ongoing case. In this context, a trace refers to a Sequence of events with corresponding payloads, where a payload consists of a set of attribute-value pairs. Meanwhile, an outcome refers to a label associated to completed cases, like, for example, a label indicating that a given case completed “on time” (with respect to a given desired duration) or “late”, or a label indicating that a given case led to a customer complaint or not. The paper tackles this problem via a two-phased approach. In the first phase, prefixes of historical cases are encoded using complex Symbolic Sequences and clustered. In the second phase, a classifier is built for each of the clusters. To predict the outcome of an ongoing case at runtime given its (uncompleted) trace, we select the closest cluster(s) to the trace in question and apply the respective classifier(s), taking into account the Euclidean distance of the trace from the center of the clusters. We consider two families of clustering algorithms – hierarchical clustering and k-medoids – and use random forests for classification. The approach was evaluated on four real-life datasets.

  • Business Process Management Workshops - Complex Symbolic Sequence clustering and multiple classifiers for predictive process monitoring
    Business Process Management Workshops, 2016
    Co-Authors: Ilya Verenich, Fabrizio Maria Maggi, Marlon Dumas, Marcello La Rosa, Chiara Di Francescomarino
    Abstract:

    This paper addresses the following predictive business process monitoring problem: Given the execution trace of an ongoing case,and given a set of traces of historical (completed) cases, predict the most likely outcome of the ongoing case. In this context, a trace refers to a Sequence of events with corresponding payloads, where a payload consists of a set of attribute-value pairs. Meanwhile, an outcome refers to a label associated to completed cases, like, for example, a label indicating that a given case completed “on time” (with respect to a given desired duration) or “late”, or a label indicating that a given case led to a customer complaint or not. The paper tackles this problem via a two-phased approach. In the first phase, prefixes of historical cases are encoded using complex Symbolic Sequences and clustered. In the second phase, a classifier is built for each of the clusters. To predict the outcome of an ongoing case at runtime given its (uncompleted) trace, we select the closest cluster(s) to the trace in question and apply the respective classifier(s), taking into account the Euclidean distance of the trace from the center of the clusters. We consider two families of clustering algorithms – hierarchical clustering and k-medoids – and use random forests for classification. The approach was evaluated on four real-life datasets