Minority Class

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 23091 Experts worldwide ranked by ideXlab platform

Qisong Shi - One of the best experts on this subject based on the ideXlab platform.

  • generative oversampling and deep forest based Minority Class sensitive fault diagnosis approach
    Systems Man and Cybernetics, 2020
    Co-Authors: Rui Fan, Qisong Shi
    Abstract:

    In the actual industrial production processes, various faults occur at different frequencies and the resulting fault data may be Class imbalanced. This means machine learning-driven fault diagnosis methods have to learn from imbalanced data, and accordingly lead to lower diagnostic accuracy or even directly errors in identifying Minority Class. To solve this problem, we present a novel Minority-Class Sensitive Fault Diagnosis approach (MSFD), which can reduce the imbalance of data and enhance the sensitivity of our diagnostic model to Minority-Class samples. Specifically, we first design a new generative oversampling method by combining Wasserstein Generative Adversarial Network (WGAN) with Synthetic Minority Oversampling Technique (SMOTE) to balance the whole dataset and improve the distribution of the Minority-Class samples. WGAN is adopted to learn the distribution of Minority-Class samples and generate some Minority-Class samples as a supplement to the original dataset, while SMOTE is applied to the resulting dataset to further enhance the diversity of synthetic samples for weakening the influence from WGAN’s mode collapse. In addition, a deep forest or multi-Grained Cascade Forest (GcForest) based Minority-Class aware fault Classification model is developed. First, during multi-grained scanning processes, we score the forests and select the corresponding forests with higher scores to generate feature representations for accelerating model convergence. Second, weights are introduced for different forests in cascade levels to further improve the overall performance of our fault diagnostic model. A series of experiments are conducted to testify the effectiveness of our proposed method, and the experimental results show that our approach can synthesize new Minority-Class samples with higher qualities and improve the diagnosis performance for Minority-Class samples as well as its overall Classification accuracy. Meanwhile, in case of extremely imbalanced datasets, the proposed approach still maintains a relatively high recognition rate for Minority-Class samples.

  • SMC - Generative Oversampling and Deep Forest based Minority-Class Sensitive Fault Diagnosis Approach
    2020 IEEE International Conference on Systems Man and Cybernetics (SMC), 2020
    Co-Authors: Rui Fan, Qisong Shi
    Abstract:

    In the actual industrial production processes, various faults occur at different frequencies and the resulting fault data may be Class imbalanced. This means machine learning-driven fault diagnosis methods have to learn from imbalanced data, and accordingly lead to lower diagnostic accuracy or even directly errors in identifying Minority Class. To solve this problem, we present a novel Minority-Class Sensitive Fault Diagnosis approach (MSFD), which can reduce the imbalance of data and enhance the sensitivity of our diagnostic model to Minority-Class samples. Specifically, we first design a new generative oversampling method by combining Wasserstein Generative Adversarial Network (WGAN) with Synthetic Minority Oversampling Technique (SMOTE) to balance the whole dataset and improve the distribution of the Minority-Class samples. WGAN is adopted to learn the distribution of Minority-Class samples and generate some Minority-Class samples as a supplement to the original dataset, while SMOTE is applied to the resulting dataset to further enhance the diversity of synthetic samples for weakening the influence from WGAN’s mode collapse. In addition, a deep forest or multi-Grained Cascade Forest (GcForest) based Minority-Class aware fault Classification model is developed. First, during multi-grained scanning processes, we score the forests and select the corresponding forests with higher scores to generate feature representations for accelerating model convergence. Second, weights are introduced for different forests in cascade levels to further improve the overall performance of our fault diagnostic model. A series of experiments are conducted to testify the effectiveness of our proposed method, and the experimental results show that our approach can synthesize new Minority-Class samples with higher qualities and improve the diagnosis performance for Minority-Class samples as well as its overall Classification accuracy. Meanwhile, in case of extremely imbalanced datasets, the proposed approach still maintains a relatively high recognition rate for Minority-Class samples.

Rui Fan - One of the best experts on this subject based on the ideXlab platform.

  • generative oversampling and deep forest based Minority Class sensitive fault diagnosis approach
    Systems Man and Cybernetics, 2020
    Co-Authors: Rui Fan, Qisong Shi
    Abstract:

    In the actual industrial production processes, various faults occur at different frequencies and the resulting fault data may be Class imbalanced. This means machine learning-driven fault diagnosis methods have to learn from imbalanced data, and accordingly lead to lower diagnostic accuracy or even directly errors in identifying Minority Class. To solve this problem, we present a novel Minority-Class Sensitive Fault Diagnosis approach (MSFD), which can reduce the imbalance of data and enhance the sensitivity of our diagnostic model to Minority-Class samples. Specifically, we first design a new generative oversampling method by combining Wasserstein Generative Adversarial Network (WGAN) with Synthetic Minority Oversampling Technique (SMOTE) to balance the whole dataset and improve the distribution of the Minority-Class samples. WGAN is adopted to learn the distribution of Minority-Class samples and generate some Minority-Class samples as a supplement to the original dataset, while SMOTE is applied to the resulting dataset to further enhance the diversity of synthetic samples for weakening the influence from WGAN’s mode collapse. In addition, a deep forest or multi-Grained Cascade Forest (GcForest) based Minority-Class aware fault Classification model is developed. First, during multi-grained scanning processes, we score the forests and select the corresponding forests with higher scores to generate feature representations for accelerating model convergence. Second, weights are introduced for different forests in cascade levels to further improve the overall performance of our fault diagnostic model. A series of experiments are conducted to testify the effectiveness of our proposed method, and the experimental results show that our approach can synthesize new Minority-Class samples with higher qualities and improve the diagnosis performance for Minority-Class samples as well as its overall Classification accuracy. Meanwhile, in case of extremely imbalanced datasets, the proposed approach still maintains a relatively high recognition rate for Minority-Class samples.

  • SMC - Generative Oversampling and Deep Forest based Minority-Class Sensitive Fault Diagnosis Approach
    2020 IEEE International Conference on Systems Man and Cybernetics (SMC), 2020
    Co-Authors: Rui Fan, Qisong Shi
    Abstract:

    In the actual industrial production processes, various faults occur at different frequencies and the resulting fault data may be Class imbalanced. This means machine learning-driven fault diagnosis methods have to learn from imbalanced data, and accordingly lead to lower diagnostic accuracy or even directly errors in identifying Minority Class. To solve this problem, we present a novel Minority-Class Sensitive Fault Diagnosis approach (MSFD), which can reduce the imbalance of data and enhance the sensitivity of our diagnostic model to Minority-Class samples. Specifically, we first design a new generative oversampling method by combining Wasserstein Generative Adversarial Network (WGAN) with Synthetic Minority Oversampling Technique (SMOTE) to balance the whole dataset and improve the distribution of the Minority-Class samples. WGAN is adopted to learn the distribution of Minority-Class samples and generate some Minority-Class samples as a supplement to the original dataset, while SMOTE is applied to the resulting dataset to further enhance the diversity of synthetic samples for weakening the influence from WGAN’s mode collapse. In addition, a deep forest or multi-Grained Cascade Forest (GcForest) based Minority-Class aware fault Classification model is developed. First, during multi-grained scanning processes, we score the forests and select the corresponding forests with higher scores to generate feature representations for accelerating model convergence. Second, weights are introduced for different forests in cascade levels to further improve the overall performance of our fault diagnostic model. A series of experiments are conducted to testify the effectiveness of our proposed method, and the experimental results show that our approach can synthesize new Minority-Class samples with higher qualities and improve the diagnosis performance for Minority-Class samples as well as its overall Classification accuracy. Meanwhile, in case of extremely imbalanced datasets, the proposed approach still maintains a relatively high recognition rate for Minority-Class samples.

Jerzy Stefanowski - One of the best experts on this subject based on the ideXlab platform.

  • types of Minority Class examples and their influence on learning Classifiers from imbalanced data
    Intelligent Information Systems, 2016
    Co-Authors: Krystyna Napierala, Jerzy Stefanowski
    Abstract:

    Many real-world applications reveal difficulties in learning Classifiers from imbalanced data. Although several methods for improving Classifiers have been introduced, the identification of conditions for the efficient use of the particular method is still an open research problem. It is also worth to study the nature of imbalanced data, characteristics of the Minority Class distribution and their influence on Classification performance. However, current studies on imbalanced data difficulty factors have been mainly done with artificial datasets and their conclusions are not easily applicable to the real-world problems, also because the methods for their identification are not sufficiently developed. In our paper, we capture difficulties of Class distribution in real datasets by considering four types of Minority Class examples: safe, borderline, rare and outliers. First, we confirm their occurrence in real data by exploring multidimensional visualizations of selected datasets. Then, we introduce a method for an identification of these types of examples, which is based on analyzing a Class distribution in a local neighbourhood of the considered example. Two ways of modeling this neighbourhood are presented: with k-nearest examples and with kernel functions. Experiments with artificial datasets show that these methods are able to re-discover simulated types of examples. Next contributions of this paper include carrying out a comprehensive experimental study with 26 real world imbalanced datasets, where (1) we identify new data characteristics basing on the analysis of types of Minority examples; (2) we demonstrate that considering the results of this analysis allow to differentiate Classification performance of popular Classifiers and pre-processing methods and to evaluate their areas of competence. Finally, we highlight directions of exploiting the results of our analysis for developing new algorithms for learning Classifiers and pre-processing methods.

  • Overlapping, Rare Examples and Class Decomposition in Learning Classifiers from Imbalanced Data
    Emerging Paradigms in Machine Learning, 2013
    Co-Authors: Jerzy Stefanowski
    Abstract:

    This paper deals with inducing Classifiers from imbalanced data, where one Class (a Minority Class) is under-represented in comparison to the remaining Classes (majority Classes). The Minority Class is usually of primary interest and it is required to recognize its members as accurately as possible. Class imbalance constitutes a difficulty for most algorithms learning Classifiers as they are biased toward the majority Classes. The first part of this study is devoted to discussing main properties of data that cause this difficulty. Following the review of earlier, related research several types of artificial, imbalanced data sets affected by critical factors have been generated. The decision trees and rule based Classifiers have been generated from these data sets. Results of first experiments show that too small number of examples from the Minority Class is not the main source of difficulties. These results confirm the initial hypothesis saying the degradation of Classification performance is more related to the Minority Class decomposition into small sub-parts. Another critical factor concerns presence of a relatively large number of borderline examples from the Minority Class in the overlapping region between Classes, in particular for non-linear decision boundaries. The novel observation is showing the impact of rare examples from the Minority Class located inside the majority Class. The experiments make visible that stepwise increasing the number of borderline and rare examples in the Minority Class has larger influence on the considered Classifiers than increasing the decomposition of this Class. The second part of this paper is devoted to studying an improvement of Classifiers by pre-processing of such data with resampling methods. Next experiments examine the influence of the identified critical data factors on performance of 4 different pre-processing re-sampling methods: two versions of random over-sampling, focused under-sampling NCR and the hybrid method SPIDER. Results show that if data is sufficiently disturbed by borderline and rare examples SPIDER and partly NCR work better than over-sampling.

  • identification of different types of Minority Class examples in imbalanced data
    Hybrid Artificial Intelligence Systems, 2012
    Co-Authors: Krystyna Napierala, Jerzy Stefanowski
    Abstract:

    The characteristics of the Minority Class distribution in imbalanced data is studied. Four types of Minority examples --- safe, borderline, rare and outlier --- are distinguished and analysed. We propose a new method for identification of these examples in the data, based on analysing the local neighbourhoods of examples. Its application to UCI imbalanced datasets shows that the Minority Class is often scattered without too many safe examples. This characteristics of data distributions is also confirmed by another analysis with Multidimensional Scaling visualization. We examine the influence of these types of examples on 6 different Classifiers learned over various real-world datasets. Results of experiments show that the particular Classifiers reveal different sensitivity to the type of examples.

  • HAIS (2) - Identification of different types of Minority Class examples in imbalanced data
    Lecture Notes in Computer Science, 2012
    Co-Authors: Krystyna Napierala, Jerzy Stefanowski
    Abstract:

    The characteristics of the Minority Class distribution in imbalanced data is studied. Four types of Minority examples --- safe, borderline, rare and outlier --- are distinguished and analysed. We propose a new method for identification of these examples in the data, based on analysing the local neighbourhoods of examples. Its application to UCI imbalanced datasets shows that the Minority Class is often scattered without too many safe examples. This characteristics of data distributions is also confirmed by another analysis with Multidimensional Scaling visualization. We examine the influence of these types of examples on 6 different Classifiers learned over various real-world datasets. Results of experiments show that the particular Classifiers reveal different sensitivity to the type of examples.

  • local neighbourhood extension of smote for mining imbalanced data
    Computational Intelligence and Data Mining, 2011
    Co-Authors: Tomasz Maciejewski, Jerzy Stefanowski
    Abstract:

    In this paper we discuss problems of inducing Classifiers from imbalanced data and improving recognition of Minority Class using focused resampling techniques. We are particularly interested in SMOTE over-sampling method that generates new synthetic examples from the Minority Class between the closest neighbours from this Class. However, SMOTE could also overgeneralize the Minority Class region as it does not consider distribution of other neighbours from the majority Classes. Therefore, we introduce a new generalization of SMOTE, called LN-SMOTE, which exploits more precisely information about the local neighbourhood of the considered examples. In the experiments we compare this method with original SMOTE and its two, the most related, other generalizations Borderline and Safe-Level SMOTE. All these pre-processing methods are applied together with either decision tree or Naive Bayes Classifiers. The results show that the new LN-SMOTE method improves evaluation measures for the Minority Class.

Payel Sadhukhan - One of the best experts on this subject based on the ideXlab platform.

  • Adaptive Learning of Minority Class prior to Minority Oversampling
    Pattern Recognition Letters, 2020
    Co-Authors: Payel Sadhukhan, Sarbani Palit
    Abstract:

    Abstract The Minority oversampling techniques have substantiated their appropriateness and utility in the domain of Class-imbalance learning. However, this does not affirm the true Class of the synthetic Minority points. In this work, Adaptive Learning of Minority Class prior to Minority Oversampling (ALMCMO), we work towards bridging this gap by estimating the Minority set before oversampling the synthetic points. We estimate a varying and adaptive volume of Minority space around the Minority points. We aim to guarantee the Class-memberships of the synthetic Minority points by sampling them from the estimated Minority spaces. In our empirical study, we have used six comparing methods, 23 datasets and two Classifiers. The results indicate the certain superiority of the proposed method over the six competing schemes.

  • learning Minority Class prior to Minority oversampling
    International Joint Conference on Neural Network, 2019
    Co-Authors: Payel Sadhukhan
    Abstract:

    The success of Minority oversampling in dealing with Class-imbalanced dataset is well manifested by existing approaches. But that do not guarantee the true Class of the synthetic Minority points. We address the given context in this paper, Learning Minority Class prior to Minority Oversampling (LMCMO). To guarantee the Class information of synthetic Minority points, we estimate the Minority spaces before generating the synthetic Minority points. The performance efficiency of the LMCMO oversampled dataset is tested on C4.5 decision tree and Linear Support Vector Machine (SVM) Classifier. Empirical evaluations on 21 datasets using four diversified metrics indicate substantial improvement in Linear SVM outcomes of the proposed method over four competing methods. A modest but still significant gain is achieved by our method over other methods on Classification using C4.5 decision tree.

  • IJCNN - Learning Minority Class prior to Minority Oversampling
    2019 International Joint Conference on Neural Networks (IJCNN), 2019
    Co-Authors: Payel Sadhukhan
    Abstract:

    The success of Minority oversampling in dealing with Class-imbalanced dataset is well manifested by existing approaches. But that do not guarantee the true Class of the synthetic Minority points. We address the given context in this paper, Learning Minority Class prior to Minority Oversampling (LMCMO). To guarantee the Class information of synthetic Minority points, we estimate the Minority spaces before generating the synthetic Minority points. The performance efficiency of the LMCMO oversampled dataset is tested on C4.5 decision tree and Linear Support Vector Machine (SVM) Classifier. Empirical evaluations on 21 datasets using four diversified metrics indicate substantial improvement in Linear SVM outcomes of the proposed method over four competing methods. A modest but still significant gain is achieved by our method over other methods on Classification using C4.5 decision tree.

Diane J. Cook - One of the best experts on this subject based on the ideXlab platform.

  • racog and wracog two probabilistic oversampling techniques
    IEEE Transactions on Knowledge and Data Engineering, 2015
    Co-Authors: Barnan Das, Narayanan C. Krishnan, Diane J. Cook
    Abstract:

    As machine learning techniques mature and are used to tackle complex scientific problems, challenges arise such as the imbalanced Class distribution problem, where one of the target Class labels is under-represented in comparison with other Classes. Existing oversampling approaches for addressing this problem typically do not consider the probability distribution of the Minority Class while synthetically generating new samples. As a result, the Minority Class is not well represented which leads to high misClassification error. We introduce two Gibbs sampling-based oversampling approaches, namely RACOG and wRACOG, to synthetically generating and strategically selecting new Minority Class samples. The Gibbs sampler uses the joint probability distribution of attributes of the data to generate new Minority Class samples in the form of Markov chain. While RACOG selects samples from the Markov chain based on a predefined lag, wRACOG selects those samples that have the highest probability of being misClassified by the existing learning model. We validate our approach using five UCI datasets that were carefully modified to exhibit Class imbalance and one new application domain dataset with inherent extreme Class imbalance. In addition, we compare the Classification performance of the proposed methods with three other existing resampling techniques.

  • ICDM - wRACOG: A Gibbs Sampling-Based Oversampling Technique
    2013 IEEE 13th International Conference on Data Mining, 2013
    Co-Authors: Barnan Das, Narayanan C. Krishnan, Diane J. Cook
    Abstract:

    As machine learning techniques mature and are used to tackle complex scientific problems, challenges arise such as the imbalanced Class distribution problem, where one of the target Class labels is under-represented in comparison with other Classes. Existing over sampling approaches for addressing this problem typically do not consider the probability distribution of the Minority Class while synthetically generating new samples. As a result, the Minority Class is not well represented which leads to high misClassification error. We introduce wRACOG, a Gibbs sampling-based over sampling approach to synthetically generating and strategically selecting new Minority Class samples. The Gibbs sampler uses the joint probability distribution of data attributes to generate new Minority Class samples in the form of a Markov chain. wRACOG iteratively learns a model by selecting samples from the Markov chain that have the highest probability of being misClassified. We validate the effectiveness of wRACOG using five UCI datasets and one new application domain dataset. A comparative study of wRACOG with three other well-known resampling methods provides evidence that wRACOG offers a definite improvement in Classification accuracy for Minority Class samples over other methods.