Outlier

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 185211 Experts worldwide ranked by ideXlab platform

Ruizhi Sun - One of the best experts on this subject based on the ideXlab platform.

  • UWFP-Outlier: an efficient frequent-pattern-based Outlier detection method for uncertain weighted data streams
    Applied Intelligence, 2020
    Co-Authors: Saihua Cai, Shangbo Hao, Ruizhi Sun
    Abstract:

    In this paper, we propose an efficient frequent-pattern-based Outlier detection method, namely, UWFP-Outlier, for identifying the implicit Outliers from uncertain weighted data streams. For reducing the time cost of the UWFP-Outlier method, in the weighted frequent pattern mining phase, we introduce the concepts of the maximal weight and maximal probability to form a compact anti-monotonic property, thereby reducing the scale of potential extensible patterns. For accurately detecting the Outliers, in the Outlier detection phase, we design two deviation indices to measure the deviation degree of each transaction in the uncertain weighted data streams by considering more factors that may influence its deviation degree; then, the transactions which have large deviation degrees are judged as Outliers. The experimental results indicate that the proposed UWFP-Outlier method can accurately detect the Outliers from uncertain weighted data streams with a lower time cost.

  • MiFI-Outlier: Minimal infrequent itemset-based Outlier detection approach on uncertain data stream
    Knowledge-Based Systems, 2020
    Co-Authors: Saihua Cai, Shangbo Hao, Gang Yuan, Ruizhi Sun
    Abstract:

    Abstract Massive Outlier detection approaches have been proposed for static datasets in the past twenty years, and they have acquired good achievements. In real life, uncertain data stream is more and more common, but most existing Outlier detection approaches were not suitable for uncertain data stream environment. In addition, many Outlier detection approaches have not considered the appearing frequency of each element, which resulted the detected Outliers not coincide with the definition of Outlier. Itemset-based Outlier detection approaches provided a good solution for this problem, and they have got more attentions in these years. In this paper, a novel two-step minimal infrequent itemset-based Outlier detection approach called MiFI-Outlier is proposed to effectively detect the Outliers from uncertain data stream. In itemset mining phase, a matrix-based method called MiFI-UDSM is proposed to mine the minimal infrequent itemsets (MiFIs) from uncertain data stream, and then an improved approach called MiFI-UDSM* is proposed for more effectively mining these minimal infrequent itemsets using the ideas of “item cap” and “support cap”. In Outlier detection phase, based on the mined MiFIs, three deviation indices including minimal infrequent itemset deviation index (MiFIDI), similarity deviation index (SDI) and transaction deviation index (TDI) are defined to measure the deviation degree of each transaction, and then the MiFI-Outlier is used to identify the Outliers from uncertain data stream. Several experimental studies are conducted on public datasets and synthetic datasets, and the results show that the proposed approaches outperform in infrequent itemset mining phase and Outlier detection phase.

  • WMFP-Outlier: An Efficient Maximal Frequent-Pattern-Based Outlier Detection Approach for Weighted Data Streams
    Information Technology And Control, 2019
    Co-Authors: Saihua Cai, Gang Yuan, Ruizhi Sun
    Abstract:

    Since Outliers are the major factors that affect accuracy in data science, many Outlier detection approaches have been proposed for effectively identifying the implicit Outliers from static datasets, thereby improving the reliability of the data. In recent years, data streams have been the main form of data, and the data elements in a data stream are not always of equal importance. However, the existing Outlier detection approaches do not consider the weight conditions; hence, these methods are not suitable for processing weighted data streams. In addition, the traditional pattern-based Outlier detection approaches incur a high time cost in the Outlier detection phase. Aiming at overcoming these problems, this paper proposes a two-phase pattern-based Outlier detection approach, namely, WMFP-Outlier, for effectively detecting the implicit Outliers from a weighted data stream, in which the maximal frequent patterns are used instead of the frequent patterns to accelerate the process of Outlier detection. In the process of maximal frequent-pattern mining, the anti-monotonicity property and MFP-array structure are used to accelerate the mining operation. In the process of Outlier detection, three deviation indices are designed for measuring the degree of abnormality of each transaction, and the transactions with the highest degrees of abnormality are judged as Outliers. Last, several experimental studies are conducted on a synthetic dataset to evaluate the performance of the proposed WMFP-Outlier approach. The results demonstrate that the accuracy of the WMFP-Outlier approach is higher compared to the existing pattern-based Outlier detection approaches, and the time cost of the Outlier detection phase of WMFP-Outlier is lower than those of the other four compared pattern-based Outlier detection approaches.

Paul J M Havinga - One of the best experts on this subject based on the ideXlab platform.

  • Outlier detection techniques for wireless sensor networks a survey
    IEEE Communications Surveys and Tutorials, 2010
    Co-Authors: Yu Zhang, Nirvana Meratnia, Paul J M Havinga
    Abstract:

    In the field of wireless sensor networks, those measurements that significantly deviate from the normal pattern of sensed data are considered as Outliers. The potential sources of Outliers include noise and errors, events, and malicious attacks on the network. Traditional Outlier detection techniques are not directly applicable to wireless sensor networks due to the nature of sensor data and specific requirements and limitations of the wireless sensor networks. This survey provides a comprehensive overview of existing Outlier detection techniques specifically developed for the wireless sensor networks. Additionally, it presents a technique-based taxonomy and a comparative table to be used as a guideline to select a technique suitable for the application at hand based on characteristics such as data type, Outlier type, Outlier identity, and Outlier degree.

  • Outlier detection techniques for wireless sensor networks a survey
    CTIT technical report series, 2008
    Co-Authors: Yu Zhang, Nirvana Meratnia, Paul J M Havinga
    Abstract:

    In the field of wireless sensor networks, measurements that significantly deviate from the normal pattern of sensed data are considered as Outliers. The potential sources of Outliers include noise and errors, events, and malicious attacks on the network. Traditional Outlier detection techniques are not directly applicable to wireless sensor networks due to the multivariate nature of sensor data and specific requirements and limitations of the wireless sensor networks. This survey provides a comprehensive overview of existing Outlier detection techniques specifically developed for the wireless sensor networks. Additionally, it presents a technique-based taxonomy and a decision tree to be used as a guideline to select a technique suitable for the application at hand based on characteristics such as data type, Outlier type, Outlier degree.

Ronald E. Brown - One of the best experts on this subject based on the ideXlab platform.

  • detecting Outliers when fitting data with nonlinear regression a new method based on robust nonlinear regression and the false discovery rate
    BMC Bioinformatics, 2006
    Co-Authors: Harvey J Motulsky, Ronald E. Brown
    Abstract:

    Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying Outliers when fitting curves with nonlinear regression. We describe a new method for identifying Outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define Outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the Outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and Outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more Outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several Outliers, the ROUT method performs well at Outlier identification, with an average False Discovery Rate less than 1%. Our method, which combines a new method of robust nonlinear regression with a new method of Outlier identification, identifies Outliers from nonlinear curve fits with reasonable power and few false positives.

  • Detecting Outliers when fitting data with nonlinear regression – a new method based on robust nonlinear regression and the false discovery rate
    BMC Bioinformatics, 2006
    Co-Authors: Harvey J Motulsky, Ronald E. Brown
    Abstract:

    Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying Outliers when fitting curves with nonlinear regression. We describe a new method for identifying Outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define Outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the Outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and Outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more Outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several Outliers, the ROUT method performs well at Outlier identification, with an average False Discovery Rate less than 1%. Our method, which combines a new method of robust nonlinear regression with a new method of Outlier identification, identifies Outliers from nonlinear curve fits with reasonable power and few false positives.

Saihua Cai - One of the best experts on this subject based on the ideXlab platform.

  • UWFP-Outlier: an efficient frequent-pattern-based Outlier detection method for uncertain weighted data streams
    Applied Intelligence, 2020
    Co-Authors: Saihua Cai, Shangbo Hao, Ruizhi Sun
    Abstract:

    In this paper, we propose an efficient frequent-pattern-based Outlier detection method, namely, UWFP-Outlier, for identifying the implicit Outliers from uncertain weighted data streams. For reducing the time cost of the UWFP-Outlier method, in the weighted frequent pattern mining phase, we introduce the concepts of the maximal weight and maximal probability to form a compact anti-monotonic property, thereby reducing the scale of potential extensible patterns. For accurately detecting the Outliers, in the Outlier detection phase, we design two deviation indices to measure the deviation degree of each transaction in the uncertain weighted data streams by considering more factors that may influence its deviation degree; then, the transactions which have large deviation degrees are judged as Outliers. The experimental results indicate that the proposed UWFP-Outlier method can accurately detect the Outliers from uncertain weighted data streams with a lower time cost.

  • MiFI-Outlier: Minimal infrequent itemset-based Outlier detection approach on uncertain data stream
    Knowledge-Based Systems, 2020
    Co-Authors: Saihua Cai, Shangbo Hao, Gang Yuan, Ruizhi Sun
    Abstract:

    Abstract Massive Outlier detection approaches have been proposed for static datasets in the past twenty years, and they have acquired good achievements. In real life, uncertain data stream is more and more common, but most existing Outlier detection approaches were not suitable for uncertain data stream environment. In addition, many Outlier detection approaches have not considered the appearing frequency of each element, which resulted the detected Outliers not coincide with the definition of Outlier. Itemset-based Outlier detection approaches provided a good solution for this problem, and they have got more attentions in these years. In this paper, a novel two-step minimal infrequent itemset-based Outlier detection approach called MiFI-Outlier is proposed to effectively detect the Outliers from uncertain data stream. In itemset mining phase, a matrix-based method called MiFI-UDSM is proposed to mine the minimal infrequent itemsets (MiFIs) from uncertain data stream, and then an improved approach called MiFI-UDSM* is proposed for more effectively mining these minimal infrequent itemsets using the ideas of “item cap” and “support cap”. In Outlier detection phase, based on the mined MiFIs, three deviation indices including minimal infrequent itemset deviation index (MiFIDI), similarity deviation index (SDI) and transaction deviation index (TDI) are defined to measure the deviation degree of each transaction, and then the MiFI-Outlier is used to identify the Outliers from uncertain data stream. Several experimental studies are conducted on public datasets and synthetic datasets, and the results show that the proposed approaches outperform in infrequent itemset mining phase and Outlier detection phase.

  • WMFP-Outlier: An Efficient Maximal Frequent-Pattern-Based Outlier Detection Approach for Weighted Data Streams
    Information Technology And Control, 2019
    Co-Authors: Saihua Cai, Gang Yuan, Ruizhi Sun
    Abstract:

    Since Outliers are the major factors that affect accuracy in data science, many Outlier detection approaches have been proposed for effectively identifying the implicit Outliers from static datasets, thereby improving the reliability of the data. In recent years, data streams have been the main form of data, and the data elements in a data stream are not always of equal importance. However, the existing Outlier detection approaches do not consider the weight conditions; hence, these methods are not suitable for processing weighted data streams. In addition, the traditional pattern-based Outlier detection approaches incur a high time cost in the Outlier detection phase. Aiming at overcoming these problems, this paper proposes a two-phase pattern-based Outlier detection approach, namely, WMFP-Outlier, for effectively detecting the implicit Outliers from a weighted data stream, in which the maximal frequent patterns are used instead of the frequent patterns to accelerate the process of Outlier detection. In the process of maximal frequent-pattern mining, the anti-monotonicity property and MFP-array structure are used to accelerate the mining operation. In the process of Outlier detection, three deviation indices are designed for measuring the degree of abnormality of each transaction, and the transactions with the highest degrees of abnormality are judged as Outliers. Last, several experimental studies are conducted on a synthetic dataset to evaluate the performance of the proposed WMFP-Outlier approach. The results demonstrate that the accuracy of the WMFP-Outlier approach is higher compared to the existing pattern-based Outlier detection approaches, and the time cost of the Outlier detection phase of WMFP-Outlier is lower than those of the other four compared pattern-based Outlier detection approaches.

Hanzi Wang - One of the best experts on this subject based on the ideXlab platform.

  • conceptual space based gross Outlier removal for geometric model fitting
    International Conference on Control Automation Robotics and Vision, 2016
    Co-Authors: Xing Wang, Jin Zheng, Guobao Xiao, Yan Yan, Hanzi Wang
    Abstract:

    In this paper, we propose an efficient and robust gross Outlier removal method, called the Conceptual Space based Gross Outlier Removal (CSGOR) method, to remove gross Outliers for geometric model fitting. In the proposed method, each data point is mapped to a conceptual space by computing the preference of "good" model hypotheses. In the conceptual space, the distributions of inliers and gross Outliers are significantly different. Specifically, inliers of each model instance are distributed in a subspace and they are far away from the origin of the conceptual space, while gross Outliers are distributed near the origin. In this manner, the problem of densely gross Outlier removal is formulated as a binary classification problem. The main advantage of the proposed method is that it can handle data with a large proportion of Outliers and effectively remove gross Outliers in data. Experimental results on both synthetic and real data have demonstrated the efficiency and effectiveness of the proposed method.

  • ICARCV - Conceptual space based gross Outlier removal for geometric model fitting
    2016 14th International Conference on Control Automation Robotics and Vision (ICARCV), 2016
    Co-Authors: Xing Wang, Jin Zheng, Guobao Xiao, Yan Yan, Hanzi Wang
    Abstract:

    In this paper, we propose an efficient and robust gross Outlier removal method, called the Conceptual Space based Gross Outlier Removal (CSGOR) method, to remove gross Outliers for geometric model fitting. In the proposed method, each data point is mapped to a conceptual space by computing the preference of "good" model hypotheses. In the conceptual space, the distributions of inliers and gross Outliers are significantly different. Specifically, inliers of each model instance are distributed in a subspace and they are far away from the origin of the conceptual space, while gross Outliers are distributed near the origin. In this manner, the problem of densely gross Outlier removal is formulated as a binary classification problem. The main advantage of the proposed method is that it can handle data with a large proportion of Outliers and effectively remove gross Outliers in data. Experimental results on both synthetic and real data have demonstrated the efficiency and effectiveness of the proposed method.