Outlier Detection

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 22776 Experts worldwide ranked by ideXlab platform

Cungen Cao - One of the best experts on this subject based on the ideXlab platform.

  • Some issues about Outlier Detection in rough set theory
    Expert Systems with Applications, 2009
    Co-Authors: Feng Jiang, Yuefei Sui, Cungen Cao
    Abstract:

    ''One person's noise is another person's signal'' (Knorr, E., Ng, R. (1998). Algorithms for mining distance-based Outliers in large datasets. In Proceedings of the 24th VLDB conference, New York (pp. 392-403)). In recent years, much attention has been given to the problem of Outlier Detection, whose aim is to detect Outliers - objects which behave in an unexpected way or have abnormal properties. Detecting such Outliers is important for many applications such as criminal activities in electronic commerce, computer intrusion attacks, terrorist threats, agricultural pest infestations, etc. And Outlier Detection is critically important in the information-based society. In this paper, we discuss some issues about Outlier Detection in rough set theory which emerged about 20 years ago, and is nowadays a rapidly developing branch of artificial intelligence and soft computing. First, we propose a novel definition of Outliers in information systems of rough set theory -sequence-based Outliers. An algorithm to find such Outliers in rough set theory is also given. The effectiveness of sequence-based method for Outlier Detection is demonstrated on two publicly available databases. Second, we introduce traditional distance-based Outlier Detection to rough set theory and discuss the definitions of distance metrics for distance-based Outlier Detection in rough set theory.

  • Outlier Detection based on rough membership function
    Lecture Notes in Computer Science, 2006
    Co-Authors: Feng Jiang, Yuefei Sui, Cungen Cao
    Abstract:

    In recent years, much attention has been given to the problem of Outlier Detection, whose aim is to detect Outliers - individuals who behave in an unexpected way or have abnormal properties. Outlier Detection is critically important in the information-based society. In this paper, we propose a new definition for Outliers in rough set theory which exploits the rough membership function. An algorithm to find such Outliers in rough set theory is also given. The effectiveness of our method for Outlier Detection is demonstrated on two publicly available databases.

Luigi Palopoli - One of the best experts on this subject based on the ideXlab platform.

  • Outlier Detection for simple default theories
    Artificial Intelligence, 2010
    Co-Authors: Fabrizio Angiulli, Rachel Ben-eliyahu-zohary, Luigi Palopoli
    Abstract:

    It was noted recently that the framework of default logics can be exploited for detecting Outliers. Outliers are observations expressed by sets of literals that feature unexpected properties. These observations are not explicitly provided in input (as it happens with abduction) but, rather, they are hidden in the given knowledge base. Unfortunately, in the two related formalisms for specifying defaults - Reiter's default logic and extended disjunctive logic programs - the most general Outlier Detection problems turn out to lie at the third level of the polynomial hierarchy. In this note, we analyze the complexity of Outlier Detection for two very simple classes of default theories, namely NU and DNU, for which the entailment problem is solvable in polynomial time. We show that, for these classes, checking for the existence of an Outlier is anyway intractable. This result contributes to further showing the inherent intractability of Outlier Detection in default reasoning.

  • Outlier Detection using default reasoning
    Artificial Intelligence, 2008
    Co-Authors: Fabrizio Angiulli, Rachel Ben-eliyahu – Zohary, Luigi Palopoli
    Abstract:

    AbstractDefault logics are usually used to describe the regular behavior and normal properties of domain elements. In this paper we suggest, conversely, that the framework of default logics can be exploited for detecting Outliers. Outliers are observations expressed by sets of literals that feature unexpected semantical characteristics. These sets of literals are selected among those explicitly embodied in the given knowledge base. Hence, essentially we perceive Outlier Detection as a knowledge discovery technique. This paper defines the notion of Outlier in two related formalisms for specifying defaults: Reiter's default logic and extended disjunctive logic programs. For each of the two formalisms, we show that finding Outliers is quite complex. Indeed, we prove that several versions of the Outlier Detection problem lie over the second level of the polynomial hierarchy. We believe that a thorough complexity analysis, as done here, is a useful preliminary step towards developing effective heuristics and exploring tractable subsets of Outlier Detection problems

  • Outlier Detection by logic programming
    ACM Transactions on Computational Logic, 2007
    Co-Authors: Fabrizio Angiulli, Gianluigi Greco, Luigi Palopoli
    Abstract:

    The development of effective knowledge discovery techniques has become a very active research area in recent years due to the important impact it has had in several relevant application domains. One interesting task therein is that of singling out anomalous individuals from a given population, for example, to detect rare events in time-series analysis settings, or to identify objects whose behavior is deviant w.r.t. a codified standard set of rules. Such exceptional individuals are usually referred to as Outliers in the literature. In this article, the concept of Outlier is formally stated in the context of knowledge-based systems, by generalizing that originally proposed in Angiulli et al. [2003] in the context of default theories. The chosen formal framework here is that of logic programming, wherein potential applications of techniques for Outlier Detection are thoroughly discussed. The proposed formalization is a novel one and helps to shed light on the nature of Outliers occurring in logic bases. Also the exploitation of minimality criteria in Outlier Detection is illustrated. The computational complexity of Outlier Detection problems arising in this novel setting is also thoroughly investigated and accounted for in the paper. Finally, rewriting algorithms are proposed that transform any Outlier Detection problem into an equivalent inference problem under stable model semantics, thereby making Outlier computation effective and realizable on top of any stable model solver.

  • Outlier Detection by Logic Programming
    arXiv: Artificial Intelligence, 2004
    Co-Authors: Fabrizio Angiulli, Gianluigi Greco, Luigi Palopoli
    Abstract:

    The development of effective knowledge discovery techniques has become in the recent few years a very active research area due to the important impact it has in several relevant application areas. One interesting task thereof is that of singling out anomalous individuals from a given population, e.g., to detect rare events in time-series analysis settings, or to identify objects whose behavior is deviant w.r.t. a codified standard set of "social" rules. Such exceptional individuals are usually referred to as Outliers in the literature. Recently, Outlier Detection has also emerged as a relevant KR&R problem. In this paper, we formally state the concept of Outliers by generalizing in several respects an approach recently proposed in the context of default logic, for instance, by having Outliers not being restricted to single individuals but, rather, in the more general case, to correspond to entire (sub)theories. We do that within the context of logic programming and, mainly through examples, we discuss its potential practical impact in applications. The formalization we propose is a novel one and helps in shedding some light on the real nature of Outliers. Moreover, as a major contribution of this work, we illustrate the exploitation of minimality criteria in Outlier Detection. The computational complexity of Outlier Detection problems arising in this novel setting is thoroughly investigated and accounted for in the paper as well. Finally, we also propose a rewriting algorithm that transforms any Outlier Detection problem into an equivalent inference problem under the stable model semantics, thereby making Outlier computation effective and realizable on top of any stable model solver.

Feng Jiang - One of the best experts on this subject based on the ideXlab platform.

  • Some issues about Outlier Detection in rough set theory
    Expert Systems with Applications, 2009
    Co-Authors: Feng Jiang, Yuefei Sui, Cungen Cao
    Abstract:

    ''One person's noise is another person's signal'' (Knorr, E., Ng, R. (1998). Algorithms for mining distance-based Outliers in large datasets. In Proceedings of the 24th VLDB conference, New York (pp. 392-403)). In recent years, much attention has been given to the problem of Outlier Detection, whose aim is to detect Outliers - objects which behave in an unexpected way or have abnormal properties. Detecting such Outliers is important for many applications such as criminal activities in electronic commerce, computer intrusion attacks, terrorist threats, agricultural pest infestations, etc. And Outlier Detection is critically important in the information-based society. In this paper, we discuss some issues about Outlier Detection in rough set theory which emerged about 20 years ago, and is nowadays a rapidly developing branch of artificial intelligence and soft computing. First, we propose a novel definition of Outliers in information systems of rough set theory -sequence-based Outliers. An algorithm to find such Outliers in rough set theory is also given. The effectiveness of sequence-based method for Outlier Detection is demonstrated on two publicly available databases. Second, we introduce traditional distance-based Outlier Detection to rough set theory and discuss the definitions of distance metrics for distance-based Outlier Detection in rough set theory.

  • Outlier Detection based on rough membership function
    Lecture Notes in Computer Science, 2006
    Co-Authors: Feng Jiang, Yuefei Sui, Cungen Cao
    Abstract:

    In recent years, much attention has been given to the problem of Outlier Detection, whose aim is to detect Outliers - individuals who behave in an unexpected way or have abnormal properties. Outlier Detection is critically important in the information-based society. In this paper, we propose a new definition for Outliers in rough set theory which exploits the rough membership function. An algorithm to find such Outliers in rough set theory is also given. The effectiveness of our method for Outlier Detection is demonstrated on two publicly available databases.

Fabrizio Angiulli - One of the best experts on this subject based on the ideXlab platform.

  • Outlier Detection for simple default theories
    Artificial Intelligence, 2010
    Co-Authors: Fabrizio Angiulli, Rachel Ben-eliyahu-zohary, Luigi Palopoli
    Abstract:

    It was noted recently that the framework of default logics can be exploited for detecting Outliers. Outliers are observations expressed by sets of literals that feature unexpected properties. These observations are not explicitly provided in input (as it happens with abduction) but, rather, they are hidden in the given knowledge base. Unfortunately, in the two related formalisms for specifying defaults - Reiter's default logic and extended disjunctive logic programs - the most general Outlier Detection problems turn out to lie at the third level of the polynomial hierarchy. In this note, we analyze the complexity of Outlier Detection for two very simple classes of default theories, namely NU and DNU, for which the entailment problem is solvable in polynomial time. We show that, for these classes, checking for the existence of an Outlier is anyway intractable. This result contributes to further showing the inherent intractability of Outlier Detection in default reasoning.

  • Outlier Detection using default reasoning
    Artificial Intelligence, 2008
    Co-Authors: Fabrizio Angiulli, Rachel Ben-eliyahu – Zohary, Luigi Palopoli
    Abstract:

    AbstractDefault logics are usually used to describe the regular behavior and normal properties of domain elements. In this paper we suggest, conversely, that the framework of default logics can be exploited for detecting Outliers. Outliers are observations expressed by sets of literals that feature unexpected semantical characteristics. These sets of literals are selected among those explicitly embodied in the given knowledge base. Hence, essentially we perceive Outlier Detection as a knowledge discovery technique. This paper defines the notion of Outlier in two related formalisms for specifying defaults: Reiter's default logic and extended disjunctive logic programs. For each of the two formalisms, we show that finding Outliers is quite complex. Indeed, we prove that several versions of the Outlier Detection problem lie over the second level of the polynomial hierarchy. We believe that a thorough complexity analysis, as done here, is a useful preliminary step towards developing effective heuristics and exploring tractable subsets of Outlier Detection problems

  • Outlier Detection by logic programming
    ACM Transactions on Computational Logic, 2007
    Co-Authors: Fabrizio Angiulli, Gianluigi Greco, Luigi Palopoli
    Abstract:

    The development of effective knowledge discovery techniques has become a very active research area in recent years due to the important impact it has had in several relevant application domains. One interesting task therein is that of singling out anomalous individuals from a given population, for example, to detect rare events in time-series analysis settings, or to identify objects whose behavior is deviant w.r.t. a codified standard set of rules. Such exceptional individuals are usually referred to as Outliers in the literature. In this article, the concept of Outlier is formally stated in the context of knowledge-based systems, by generalizing that originally proposed in Angiulli et al. [2003] in the context of default theories. The chosen formal framework here is that of logic programming, wherein potential applications of techniques for Outlier Detection are thoroughly discussed. The proposed formalization is a novel one and helps to shed light on the nature of Outliers occurring in logic bases. Also the exploitation of minimality criteria in Outlier Detection is illustrated. The computational complexity of Outlier Detection problems arising in this novel setting is also thoroughly investigated and accounted for in the paper. Finally, rewriting algorithms are proposed that transform any Outlier Detection problem into an equivalent inference problem under stable model semantics, thereby making Outlier computation effective and realizable on top of any stable model solver.

  • Outlier Detection by Logic Programming
    arXiv: Artificial Intelligence, 2004
    Co-Authors: Fabrizio Angiulli, Gianluigi Greco, Luigi Palopoli
    Abstract:

    The development of effective knowledge discovery techniques has become in the recent few years a very active research area due to the important impact it has in several relevant application areas. One interesting task thereof is that of singling out anomalous individuals from a given population, e.g., to detect rare events in time-series analysis settings, or to identify objects whose behavior is deviant w.r.t. a codified standard set of "social" rules. Such exceptional individuals are usually referred to as Outliers in the literature. Recently, Outlier Detection has also emerged as a relevant KR&R problem. In this paper, we formally state the concept of Outliers by generalizing in several respects an approach recently proposed in the context of default logic, for instance, by having Outliers not being restricted to single individuals but, rather, in the more general case, to correspond to entire (sub)theories. We do that within the context of logic programming and, mainly through examples, we discuss its potential practical impact in applications. The formalization we propose is a novel one and helps in shedding some light on the real nature of Outliers. Moreover, as a major contribution of this work, we illustrate the exploitation of minimality criteria in Outlier Detection. The computational complexity of Outlier Detection problems arising in this novel setting is thoroughly investigated and accounted for in the paper as well. Finally, we also propose a rewriting algorithm that transforms any Outlier Detection problem into an equivalent inference problem under the stable model semantics, thereby making Outlier computation effective and realizable on top of any stable model solver.

Charu C. Aggarwal - One of the best experts on this subject based on the ideXlab platform.

  • Outlier Detection for Temporal Data
    2014
    Co-Authors: Manish Gupta, Jing Gao, Charu C. Aggarwal, Jiawei Han
    Abstract:

    Outlier (or anomaly) Detection is a very broad field which has been studied in the context of a large number of research areas like statistics, data mining, sensor networks, environmental science, distributed systems, spatio-temporal mining, etc. Initial research in Outlier Detection focused on time series-based Outliers (in statistics). Since then, Outlier Detection has been studied on a large variety of data types including high-dimensional data, uncertain data, stream data, network data, time series data, spatial data, and spatio-temporal data. While there have been many tutorials and surveys for general Outlier Detection, we focus on Outlier Detection for temporal data in this book. A large number of applications generate temporal datasets. For example, in our everyday life, various kinds of records like credit, personnel, financial, judicial, medical, etc., are all temporal. This stresses the need for an organized and detailed study of Outliers with respect to such temporal data. In the past decade, there has been a lot of research on various forms of temporal data including consecutive data snapshots, series of data snapshots and data streams. Besides the initial work on time series, researchers have focused on rich forms of data including multiple data streams, spatio-temporal data, network data, community distribution data, etc. Compared to general Outlier Detection, techniques for temporal Outlier Detection are very different. In this book, we will present an organized picture of both recent and past research in temporal Outlier Detection. We start with the basics and then ramp up the reader to the main ideas in state-of-the-art Outlier Detection techniques. We motivate the importance of temporal Outlier Detection and brief the challenges beyond usual Outlier Detection. Then, we list down a taxonomy of proposed techniques for temporal Outlier Detection. Such techniques broadly include statistical techniques (like AR models, Markov models, histograms, neural networks), distance- and density-based approaches, grouping-based approaches (clustering, community Detection), network-based approaches, and spatio-temporal Outlier Detection approaches. We summarize by presenting a wide collection of applications where temporal Outlier Detection techniques have been applied to discover interesting Outliers.

  • Tutorial: Outlier Detection for Temporal Data
    2013
    Co-Authors: Manish Gupta, Jing Gao, Charu C. Aggarwal, Jiawei Han
    Abstract:

    Outlier (or anomaly) Detection is a very broad field which has been studied in the context of a large number of research areas like statistics, data mining, sensor networks, environmental science, distributed systems, spatio-temporal mining, etc. The first few articles in Outlier Detection focused on time series based Outliers (in statistics). Since then, Outlier Detection has been studied on a large variety of data types including high-dimensional data, uncertain data, stream data, network data, time series data, spatial data, and spatiotemporal data. While there have been many tutorials and surveys for general Outlier Detection, we focus on Outlier Detection for temporal data in this tutorial. A large number of applications generate temporal datasets. For example, in our everyday life, various kinds of records like credit, personnel, financial, judicial, medical, etc. are all temporal. This stresses the need for an organized and detailed study of Outliers with respect to such temporal data. In the past decade, there has been a lot of research on various forms of temporal data including consecutive data snapshots, series of data snapshots and data streams. Besides the initial work on time series, researchers have focused on rich forms of data including multiple data streams, spatio-temporal data, network data, community distribution data, etc. Compared to general Outlier Detection, techniques for temporal Outlier Detection are very different, like AR models, Markov models, evolutionary clustering, etc. In this tutorial, we will present an organized picture of recent research in temporal Outlier Detection. We begin by motivating the importance of temporal Outlier Detection and briefing the challenges beyond usual Outlier Detection. Then, we list down a taxonomy of proposed techniques for temporal Outlier Detection. Such techniques broadly include statistical techniques (like AR models, Markov models, histograms, neural networks), distance and density based approaches, grouping based approaches (clustering, community Detection), network based approaches, and spatio-temporal Outlier Detection approaches. We summarize by presenting a collection of applications where temporal Outlier Detection techniques have been applied to discover interesting Outliers.

  • Supervised Outlier Detection
    Outlier Analysis, 2012
    Co-Authors: Charu C. Aggarwal
    Abstract:

    The discussions in the previous chapters focus on the problem of unsupervised Outlier Detection in which no prior information is available about the abnormalities in the data. In such scenarios, many of the anomalies found correspond to noise or other uninteresting phenomena. It has been observed [338, 374, 531] in diverse applications such as system anomaly Detection, financial fraud, and Web robot Detection that interesting anomalies are often highly specific to particular types of abnormal activity in the underlying application. In such cases, an unsupervised Outlier Detection method might discover noise, which is not specific to that activity, and therefore may not be of interest to an analyst. In many cases, different types of abnormal instances could be present, and it may be desirable to distinguish among them. For example, in an intrusion-Detection scenario, different types of intrusion anomalies are possible, and the specific type of an intrusion is important information.

  • Outlier Detection in graph streams
    International Conference on Data Engineering, 2011
    Co-Authors: Charu C. Aggarwal, Yuchen Zhao
    Abstract:

    A number of applications in social networks, telecommunications, and mobile computing create massive streams of graphs. In many such applications, it is useful to detect structural abnormalities which are different from the “typical” behavior of the underlying network. In this paper, we will provide first results on the problem of structural Outlier Detection in massive network streams. Such problems are inherently challenging, because the problem of Outlier Detection is specially challenging because of the high volume of the underlying network stream. The stream scenario also increases the computational challenges for the approach. We use a structural connectivity model in order to define Outliers in graph streams. In order to handle the sparsity problem of massive networks, we dynamically partition the network in order to construct statistically robust models of the connectivity behavior. We design a reservoir sampling method in order to maintain structural summaries of the underlying network. These structural summaries are designed in order to create robust, dynamic and efficient models for Outlier Detection in graph streams. We present experimental results illustrating the effectiveness and efficiency of our approach.

  • Outlier Detection with uncertain data
    SIAM International Conference on Data Mining, 2008
    Co-Authors: Charu C. Aggarwal
    Abstract:

    In recent years, many new techniques have been developed for mining and managing uncertain data. This is because of the new ways of collecting data which has resulted in enormous amounts of inconsistent or missing data. Such data is often remodeled in the form of uncertain data. In this paper, we will examine the problem of Outlier Detection with uncertain data sets. The Outlier Detection problem is particularly challenging for the uncertain case, because the Outlier-like behavior of a data point may be a result of the uncertainty added to the data point. Furthermore, the uncertainty added to the other data points may skew the overall data distribution in such a way that true Outliers may be masked. Therefore, it is critical to be able to remove the effects of the uncertainty added both at the aggregate level as well as at the level of individual data points. In this paper, we will examine a density based approach to Outlier Detection, and show how to use it to remove the uncertainty from the underlying data. We present experimental results illustrating the effectiveness of the method.