Classification Performance

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 328347 Experts worldwide ranked by ideXlab platform

Weike Nie - One of the best experts on this subject based on the ideXlab platform.

  • Optimizing the prototypes with a novel data weighting algorithm for enhancing the Classification Performance of fuzzy clustering
    Fuzzy Sets and Systems, 2020
    Co-Authors: Witold Pedrycz, Weike Nie
    Abstract:

    Abstract Fuzzy clustering is regarded as an unsupervised learning process that constitutes a prerequisite for many other data mining techniques. Deciding how to classify data efficiently and accurately has been one of the topics pursued by many researchers. We anticipate that the Classification Performance of the clustering is strongly dependent on the boundary data (viz. data located at the boundaries of the clusters). The boundary data hold some levels of uncertainties and as such contain more information than others. Usually the greater the uncertainty, the more information contained in such data. To improve the quality of clustering, this study develops an augmented scheme of fuzzy clustering, in which a novel weighted data-based fuzzy clustering is proposed. In the introduced scheme, a dataset is composed of boundary data and non-boundary data. The partition matrix is used to determine the boundary data and the non-boundary data to be next considered in the clustering process. Then, we assign different weights to each datum to construct the weighted data. During this process, we make the weights for the boundary data and the non-boundary data different, which makes the contributions of the boundary data and the non-boundary data to the prototypes being reduced and enhanced, respectively. Furthermore, we build a weighting function to determine the weights of the data. The weighted data are used to optimize the prototypes. With the optimized prototypes, the partition matrix can be refined, which ultimately makes the boundaries of the clusters optimized. Finally, the Classification Performance of fuzzy clustering is enhanced. We offer a thorough analysis of the developed scheme. Comprehensive experimental studies involving synthetic and publicly available datasets are reported to demonstrate the Performance of the proposed approach.

  • Constructing a Virtual Space for Enhancing the Classification Performance of Fuzzy Clustering
    IEEE Transactions on Fuzzy Systems, 2019
    Co-Authors: Witold Pedrycz, Weike Nie
    Abstract:

    Clustering offers a general methodology and comes with a remarkably rich conceptual and algorithmic framework for data analysis and data interpretation. As one of the most representative algorithms of fuzzy clustering, fuzzy C-means (FCM) is a widely used objective function-based clustering method exploited in various applications. In this study, a virtual-based fuzzy clustering algorithm is proposed to improve the Classification Performance coming as a result of using fuzzy clustering. This improvement is achieved by forming a virtual space based on the original data space. First, we construct a piecewise linear transformation function to modify the similarity matrix of the original data and build the so-called virtual similarity matrix (VSM). Considering the VSM, the effect of closeness becomes amplified; in other words, high similarity values (say, larger than α which is a cutoff value of the large and small similarity in this paper) present in the original similarity matrix are made higher, whereas lower similarity levels (say, smaller than α ) are further reduced. In addition, data with high similarity (say, larger than a certain threshold value) observed in the original space will overlap (the attributes of the samples are exactly the same) significantly in the virtual space; the overlapping samples can be treated as one sample. This modification makes possible easier to identify clusters. Second, we build a relationship matrix between the original dataset and the determined similarity values and present two closed-form solutions to the problem of building the relationship matrix. Subsequently, a virtual space of the original data space is derived through the modified similarity matrix and the introduced relationship matrix. We offer a thorough analysis behind the developed clustering algorithm. The experimental results are in agreement with the underlying conceptual basis. Furthermore, the resulting Classification Performance is significantly improved compared with the results produced by the FCM and the kernel-based fuzzy C-means.

Amri Napolitano - One of the best experts on this subject based on the ideXlab platform.

  • ICTAI - Using Feature Selection in Combination with Ensemble Learning Techniques to Improve Tweet Sentiment Classification Performance
    2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), 2015
    Co-Authors: Joseph D. Prusa, Taghi M. Khoshgoftaar, Amri Napolitano
    Abstract:

    Performing sentiment analysis of tweets by training a classifier is a challenging and complex task, requiring that the classifier can correctly and reliably identify the emotional polarity of a tweet. Poor data quality, due to class imbalance or mislabeled instances, may negatively impact Classification Performance. Ensemble learning techniques combine multiple models in an attempt to improve Classification Performance, especially on poor quality or imbalanced data, however, these techniques do not address the concern of high dimensionality present in tweets sentiment data and may require a prohibitive amount of resources to train on high dimensional data. This work addresses these issues by studying bagging and boosting combined with feature selection. These two techniques are denoted as Select-Bagging and Select-Boost, and seek to address both poor data quality and high dimensionality. We compare the Performance of Select-Bagging and Select-Boost against feature selection alone. These techniques are tested with four base learners, two datasets and ten feature subset sizes. Our results show that Select-Boost offers the highest Performance, is significantly better than using no ensemble technique, and is significantly better than Select-Bagging for most learners on both datasets. To the best of our knowledge, this is the first study to focus on the effects of using ensemble learning in combination with feature selection for the purpose of tweet sentiment Classification.

  • ICTAI - Maximizing Classification Performance for Patient Response Datasets
    2013 IEEE 25th International Conference on Tools with Artificial Intelligence, 2013
    Co-Authors: David J. Dittman, Taghi M. Khoshgoftaar, Randall Wald, Amri Napolitano
    Abstract:

    The ability to predict a patient's response to a treatment has long been a goal in the fields of medicine andpharmacology. This is especially true for cancer treatments, as many of these incur extreme side effects as a consequenceof destroying healthy cells along with cancerous ones. Geneprofiles such as DNA microarrays could potentially containinformation on which treatments are most likely to work withminimal side effects. However, DNA microarray datasets canbe challenging due to the large number of features (genes) per sample, many of which are irrelevant or redundant. Techniques from the domain of data mining may help both identifythe most important features and build Classification modelsusing those features. This paper is a comprehensive study onthe relative Performance of many different feature selectionapproaches and Classification models when applied to fifteenpatient response datasets. We use six classifiers along withtwelve feature subset sizes and twenty-five feature selection techniques. Our results show that the Random Forest classifieris the top performing classifier in terms of both average resultsacross all feature selection techniques and when using thebest-performing feature selection technique, and also had thesmallest range between the best and worst performing featureselection techniques. Additionally, we found that for the averageand best feature selection technique Performance, as the featuresubset size increases, the Classification Performance increases. Finally, we found that different feature selection techniquesdominated Performance for different feature subset sizes, andlikewise the worst performers also depended on the chosenfeature subset size. Statistical analysis was conducted to furthervalidate our results. Overall, based on our results we wouldrecommend the use of Random Forest along with a featureselection technique (the choice not being statistically significant)that reduces the feature set to around 1000 features, in orderto both maximize Classification Performance and remove onestep (choosing an appropriate feature ranking technique) from the process.

  • rusboost improving Classification Performance when training data is skewed
    International Conference on Pattern Recognition, 2008
    Co-Authors: C Seiffert, Taghi M. Khoshgoftaar, J Van Hulse, Amri Napolitano
    Abstract:

    Constructing Classification models using skewed training data can be a challenging task. We present RUSBoost, a new algorithm for alleviating the problem of class imbalance. RUSBoost combines data sampling and boosting, providing a simple and efficient method for improving Classification Performance when training data is imbalanced. In addition to performing favorably when compared to SMOTEBoost (another hybrid sampling/boosting algorithm), RUSBoost is computationally less expensive than SMOTEBoost and results in significantly shorter model training times. This combination of simplicity, speed and Performance makes RUSBoost an excellent technique for learning from imbalanced data.

Daniel Schwarz - One of the best experts on this subject based on the ideXlab platform.

  • Supervised, Multivariate, Whole-Brain Reduction Did Not Help to Achieve High Classification Performance in Schizophrenia Research
    Frontiers in neuroscience, 2016
    Co-Authors: Eva Janoušová, Giovanni Montana, Tomáš Kašpárek, Daniel Schwarz
    Abstract:

    We examined how penalized linear discriminant analysis with resampling, which is a supervised, multivariate, whole-brain reduction technique, can help schizophrenia diagnostics and research. In an experiment with magnetic resonance brain images of 52 first-episode schizophrenia patients and 52 healthy controls, this method allowed us to select brain areas relevant to schizophrenia, such as the left prefrontal cortex, the anterior cingulum, the right anterior insula, the thalamus and the hippocampus. Nevertheless, the Classification Performance based on such reduced data was not significantly better than the Classification of data reduced by mass univariate selection using a t-test or unsupervised multivariate reduction using principal component analysis. Moreover, we found no important influence of the type of imaging features, namely local deformations or grey matter volumes, and the Classification method, specifically linear discriminant analysis or linear support vector machines, on the Classification results. However, we ascertained significant effect of a cross-validation setting on Classification Performance as Classification results were overestimated even though the resampling was performed during the selection of brain imaging features. Therefore, it is critically important to perform cross-validation in all steps of the analysis (not only during Classification) in case there is no external validation set to avoid optimistically biasing the results of Classification studies.

Witold Pedrycz - One of the best experts on this subject based on the ideXlab platform.

  • Optimizing the prototypes with a novel data weighting algorithm for enhancing the Classification Performance of fuzzy clustering
    Fuzzy Sets and Systems, 2020
    Co-Authors: Witold Pedrycz, Weike Nie
    Abstract:

    Abstract Fuzzy clustering is regarded as an unsupervised learning process that constitutes a prerequisite for many other data mining techniques. Deciding how to classify data efficiently and accurately has been one of the topics pursued by many researchers. We anticipate that the Classification Performance of the clustering is strongly dependent on the boundary data (viz. data located at the boundaries of the clusters). The boundary data hold some levels of uncertainties and as such contain more information than others. Usually the greater the uncertainty, the more information contained in such data. To improve the quality of clustering, this study develops an augmented scheme of fuzzy clustering, in which a novel weighted data-based fuzzy clustering is proposed. In the introduced scheme, a dataset is composed of boundary data and non-boundary data. The partition matrix is used to determine the boundary data and the non-boundary data to be next considered in the clustering process. Then, we assign different weights to each datum to construct the weighted data. During this process, we make the weights for the boundary data and the non-boundary data different, which makes the contributions of the boundary data and the non-boundary data to the prototypes being reduced and enhanced, respectively. Furthermore, we build a weighting function to determine the weights of the data. The weighted data are used to optimize the prototypes. With the optimized prototypes, the partition matrix can be refined, which ultimately makes the boundaries of the clusters optimized. Finally, the Classification Performance of fuzzy clustering is enhanced. We offer a thorough analysis of the developed scheme. Comprehensive experimental studies involving synthetic and publicly available datasets are reported to demonstrate the Performance of the proposed approach.

  • Constructing a Virtual Space for Enhancing the Classification Performance of Fuzzy Clustering
    IEEE Transactions on Fuzzy Systems, 2019
    Co-Authors: Witold Pedrycz, Weike Nie
    Abstract:

    Clustering offers a general methodology and comes with a remarkably rich conceptual and algorithmic framework for data analysis and data interpretation. As one of the most representative algorithms of fuzzy clustering, fuzzy C-means (FCM) is a widely used objective function-based clustering method exploited in various applications. In this study, a virtual-based fuzzy clustering algorithm is proposed to improve the Classification Performance coming as a result of using fuzzy clustering. This improvement is achieved by forming a virtual space based on the original data space. First, we construct a piecewise linear transformation function to modify the similarity matrix of the original data and build the so-called virtual similarity matrix (VSM). Considering the VSM, the effect of closeness becomes amplified; in other words, high similarity values (say, larger than α which is a cutoff value of the large and small similarity in this paper) present in the original similarity matrix are made higher, whereas lower similarity levels (say, smaller than α ) are further reduced. In addition, data with high similarity (say, larger than a certain threshold value) observed in the original space will overlap (the attributes of the samples are exactly the same) significantly in the virtual space; the overlapping samples can be treated as one sample. This modification makes possible easier to identify clusters. Second, we build a relationship matrix between the original dataset and the determined similarity values and present two closed-form solutions to the problem of building the relationship matrix. Subsequently, a virtual space of the original data space is derived through the modified similarity matrix and the introduced relationship matrix. We offer a thorough analysis behind the developed clustering algorithm. The experimental results are in agreement with the underlying conceptual basis. Furthermore, the resulting Classification Performance is significantly improved compared with the results produced by the FCM and the kernel-based fuzzy C-means.

Kai Keng Ang - One of the best experts on this subject based on the ideXlab platform.

  • The predictive role of pre-cue EEG rhythms on MI-based BCI Classification Performance
    Journal of neuroscience methods, 2014
    Co-Authors: Atieh Bamdadian, Cuntai Guan, Kai Keng Ang
    Abstract:

    Abstract Background One of the main issues in motor imagery-based (MI-based) brain–computer interface (BCI) systems is a large variation in the Classification Performance of BCI users. However, the exact reason of low Performance of some users is still under investigation. Having some prior knowledge about the Performance of users may be helpful in understanding possible reasons of Performance variations. New method In this study a novel coefficient from pre-cue EEG rhythms is proposed. The proposed coefficient is computed from the spectral power of pre-cue EEG data for specific rhythms over different regions of the brain. The feasibility of predicting the Classification Performance of the MI-based BCI users from the proposed coefficient is investigated. Results Group level analysis on N  = 17 healthy subjects showed that there is a significant correlation r  = 0.53 ( p  = 0.02) between the proposed coefficient and the cross-validation accuracies of the subjects in performing MI. The results showed that subjects with higher cross-validation accuracies have yielded significantly higher values of the proposed coefficient and vice versa. Comparison with existing methods In comparison with other previous predictors, this coefficient captures spatial information from the brain in addition to spectral information. Conclusion The result of using the proposed coefficient suggests that having higher frontal theta and lower posterior alpha prior to performing MI may enhance the BCI Classification Performance. This finding reveals prospect of designing a novel experiment to prepare the user towards improved motor imagery Performance.