Classifier

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 1480140 Experts worldwide ranked by ideXlab platform

Eduardo R Hruschka - One of the best experts on this subject based on the ideXlab platform.

  • tweet sentiment analysis with Classifier ensembles
    Decision Support Systems, 2014
    Co-Authors: Nadia Felix Felipe Da Silva, Eduardo R Hruschka
    Abstract:

    Twitter is a microblogging site in which users can post updates (tweets) to friends (followers). It has become an immense dataset of the so-called sentiments. In this paper, we introduce an approach that automatically classifies the sentiment of tweets by using Classifier ensembles and lexicons. Tweets are classified as either positive or negative concerning a query term. This approach is useful for consumers who can use sentiment analysis to search for products, for companies that aim at monitoring the public sentiment of their brands, and for many other applications. Indeed, sentiment classification in microblogging services (e.g., Twitter) through Classifier ensembles and lexicons has not been well explored in the literature. Our experiments on a variety of public tweet sentiment datasets show that Classifier ensembles formed by Multinomial Naive Bayes, SVM, Random Forest, and Logistic Regression can improve classification accuracy. We show that Classifier ensembles are promising for tweet sentiment analysis.We compare bag-of-words and feature hashing for the representation of tweets.Classifier ensembles obtained from bag-of-words and feature hashing are discussed.

Liang Liang - One of the best experts on this subject based on the ideXlab platform.

  • predicting corporate financial distress based on integration of support vector machine and logistic regression
    Expert Systems With Applications, 2007
    Co-Authors: Yu Wang, Xiaoyan Xu, Bin Zhang, Liang Liang
    Abstract:

    The support vector machine (SVM) has been applied to the problem of bankruptcy prediction, and proved to be superior to competing methods such as the neural network, the linear multiple discriminant approaches and logistic regression. However, the conventional SVM employs the structural risk minimization principle, thus empirical risk of misclassification may be high, especially when a point to be classified is close to the hyperplane. This paper develops an integrated binary discriminant rule (IBDR) for corporate financial distress prediction. The described approach decreases the empirical risk of SVM outputs by interpreting and modifying the outputs of the SVM Classifiers according to the result of logistic regression analysis. That is, depending on the vector's relative distance from the hyperplane, if result of logistic regression supports the output of the SVM Classifier with a high probability, then IBDR will accept the output of the SVM Classifier; otherwise, IBDR will modify the output of the SVM Classifier. Our experimentation results demonstrate that IBDR outperforms the conventional SVM.

Taxiarchis Botsis - One of the best experts on this subject based on the ideXlab platform.

  • text mining for the vaccine adverse event reporting system medical text classification using informative feature selection
    Journal of the American Medical Informatics Association, 2011
    Co-Authors: Michael D. Nguyen, Taxiarchis Botsis, Marianthi Markatou, Robert Ball
    Abstract:

    Objective The US Vaccine Adverse Event Reporting System (VAERS) collects spontaneous reports of adverse events following vaccination. Medical officers review the reports and often apply standardized case definitions, such as those developed by the Brighton Collaboration. Our objective was to demonstrate a multi-level text mining approach for automated text classification of VAERS reports that could potentially reduce human workload. Design We selected 6034 VAERS reports for H1N1 vaccine that were classified by medical officers as potentially positive (Npos=237) or negative for anaphylaxis. We created a categorized corpus of text files that included the class label and the symptom text field of each report. A validation set of 1100 labeled text files was also used. Text mining techniques were applied to extract three feature sets for important keywords, low- and high-level patterns. A rule-based Classifier processed the high-level feature representation, while several machine learning Classifiers were trained for the remaining two feature representations. Measurements Classifiers' performance was evaluated by macro-averaging recall, precision, and F-measure, and Friedman's test; misclassification error rate analysis was also performed. Results Rule-based Classifier, boosted trees, and weighted support vector machines performed well in terms of macro-recall, however at the expense of a higher mean misclassification error rate. The rule-based Classifier performed very well in terms of average sensitivity and specificity (79.05% and 94.80%, respectively). Conclusion Our validated results showed the possibility of developing effective medical text Classifiers for VAERS reports by combining text mining with informative feature selection; this strategy has the potential to reduce reviewer workload considerably.

Yu Wang - One of the best experts on this subject based on the ideXlab platform.

  • predicting corporate financial distress based on integration of support vector machine and logistic regression
    Expert Systems With Applications, 2007
    Co-Authors: Yu Wang, Xiaoyan Xu, Bin Zhang, Liang Liang
    Abstract:

    The support vector machine (SVM) has been applied to the problem of bankruptcy prediction, and proved to be superior to competing methods such as the neural network, the linear multiple discriminant approaches and logistic regression. However, the conventional SVM employs the structural risk minimization principle, thus empirical risk of misclassification may be high, especially when a point to be classified is close to the hyperplane. This paper develops an integrated binary discriminant rule (IBDR) for corporate financial distress prediction. The described approach decreases the empirical risk of SVM outputs by interpreting and modifying the outputs of the SVM Classifiers according to the result of logistic regression analysis. That is, depending on the vector's relative distance from the hyperplane, if result of logistic regression supports the output of the SVM Classifier with a high probability, then IBDR will accept the output of the SVM Classifier; otherwise, IBDR will modify the output of the SVM Classifier. Our experimentation results demonstrate that IBDR outperforms the conventional SVM.

Nadia Felix Felipe Da Silva - One of the best experts on this subject based on the ideXlab platform.

  • tweet sentiment analysis with Classifier ensembles
    Decision Support Systems, 2014
    Co-Authors: Nadia Felix Felipe Da Silva, Eduardo R Hruschka
    Abstract:

    Twitter is a microblogging site in which users can post updates (tweets) to friends (followers). It has become an immense dataset of the so-called sentiments. In this paper, we introduce an approach that automatically classifies the sentiment of tweets by using Classifier ensembles and lexicons. Tweets are classified as either positive or negative concerning a query term. This approach is useful for consumers who can use sentiment analysis to search for products, for companies that aim at monitoring the public sentiment of their brands, and for many other applications. Indeed, sentiment classification in microblogging services (e.g., Twitter) through Classifier ensembles and lexicons has not been well explored in the literature. Our experiments on a variety of public tweet sentiment datasets show that Classifier ensembles formed by Multinomial Naive Bayes, SVM, Random Forest, and Logistic Regression can improve classification accuracy. We show that Classifier ensembles are promising for tweet sentiment analysis.We compare bag-of-words and feature hashing for the representation of tweets.Classifier ensembles obtained from bag-of-words and feature hashing are discussed.