Text Categorization - Explore the Science & Experts

The Experts below are selected from a list of 16137 Experts worldwide ranked by ideXlab platform

Xian Jia-yang - One of the best experts on this subject based on the ideXlab platform.

Text Categorization Algorithm Based on Centroid

Computer Engineering, 2009

Co-Authors: Xian Jia-yang

Abstract:

The performance of Text Categorization algorithm based on centroid is poor when the documents are dispersive or existing more than one peak value.Aiming at this problem,this paper proposes an improved Text Categorization algorithm whose performance is higher than classical Categorization algorithm based on centroid.Experimental results in the documents set provided by Wisers Information Limited show that this algorithm can obtain satisfactory efficiency and precision.

15 days free trial to Access Article

Qin Gang - One of the best experts on this subject based on the ideXlab platform.

Active Learning Based Text Categorization

Computer Science, 2003

Co-Authors: Qin Gang

Abstract:

In the field of Text Categorization,the number of unlabeled documents is generally much gretaer than that of labeled documents. Text Categorization is the problem of Categorization in high-dimension vector space, and more training samples will generally improve the accuracy of Text classifier. How to add the unlabeled documents of training set so as to expand training set is a valuable problem. The theory of active learning is introducted and applied to the field of Text Categorization in this paper,exploring the method of using unlabeled documents to improve the accuracy of Text classifier. It is expected that such technology will improve Text classifier's accuracy through adopting relatively large number of unlabelled documents samples. We brought forward an active learning based algorithm for Text Categorization,and the experiments on Reuters news corpus showed that when enough training samples available,it's effective for the algorithm to promote Text classifier's accuracy through adopting unlabelled document samples.

15 days free trial to Access Article

Michael R Lyu - One of the best experts on this subject based on the ideXlab platform.

CIKM - Semi-supervised Text Categorization by active search

Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM '08, 2008

Co-Authors: Rong Jin, Michael R Lyu, Kaizhu Huang, Irwin King

Abstract:

In automated Text Categorization, given a small number of labeled documents, it is very challenging, if not impossible, to build a reliable classifier that is able to achieve high classification accuracy. To address this problem, a novel web-assisted Text Categorization framework is proposed in this paper. Important keywords are first automatically identified from the available labeled documents to form the queries. Search engines are then utilized to retrieve from the Web a multitude of relevant documents, which are then exploited by a semi-supervised framework. To our best knowledge, this work is the first study of this kind. Extensive experimental study shows the encouraging results of the proposed Text Categorization framework: using Google as the web search engine, the proposed framework is able to reduce the classification error by 30% when compared with the state-of-the-art supervised Text Categorization method.

15 days free trial to Access Article
Large-scale Text Categorization by batch mode active learning

Proceedings of the 15th international conference on World Wide Web - WWW '06, 2006

Co-Authors: Steven Chu Hong Hoi, Rong Jin, Michael R Lyu

Abstract:

Large-scale Text Categorization is an important research topic for Web data mining. One of the challenges in large-scale Text Categorization is how to reduce the human efforts in labeling Text documents for building reliable classification models. In the past, there have been many studies on applying active learning methods to automatic Text Categorization, which try to select the most informative documents for labeling manually. Most of these studies focused on selecting a single unlabeled document in each iteration. As a result, the Text Categorization model has to be retrained after each labeled document is solicited. In this paper, we present a novel active learning algorithm that selects a batch of Text documents for labeling manually in each iteration. The key of the batch mode active learning is how to reduce the redundancy among the selected examples such that each example provides unique information for model updating. To this end, we use the Fisher information matrix as the measurement of model uncertainty and choose the set of documents to effectively maximize the Fisher information of a classification model. Extensive experiments with three different datasets have shown that our algorithm is more effective than the state-of-the-art active learning techniques for Text Categorization and can be a promising tool toward large-scale Text Categorization for World Wide Web documents.

15 days free trial to Access Article

Rong Jin - One of the best experts on this subject based on the ideXlab platform.

CIKM - Semi-supervised Text Categorization by active search

Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM '08, 2008

Co-Authors: Rong Jin, Michael R Lyu, Kaizhu Huang, Irwin King

Abstract:

In automated Text Categorization, given a small number of labeled documents, it is very challenging, if not impossible, to build a reliable classifier that is able to achieve high classification accuracy. To address this problem, a novel web-assisted Text Categorization framework is proposed in this paper. Important keywords are first automatically identified from the available labeled documents to form the queries. Search engines are then utilized to retrieve from the Web a multitude of relevant documents, which are then exploited by a semi-supervised framework. To our best knowledge, this work is the first study of this kind. Extensive experimental study shows the encouraging results of the proposed Text Categorization framework: using Google as the web search engine, the proposed framework is able to reduce the classification error by 30% when compared with the state-of-the-art supervised Text Categorization method.

15 days free trial to Access Article
Large-scale Text Categorization by batch mode active learning

Proceedings of the 15th international conference on World Wide Web - WWW '06, 2006

Co-Authors: Steven Chu Hong Hoi, Rong Jin, Michael R Lyu

Abstract:

Large-scale Text Categorization is an important research topic for Web data mining. One of the challenges in large-scale Text Categorization is how to reduce the human efforts in labeling Text documents for building reliable classification models. In the past, there have been many studies on applying active learning methods to automatic Text Categorization, which try to select the most informative documents for labeling manually. Most of these studies focused on selecting a single unlabeled document in each iteration. As a result, the Text Categorization model has to be retrained after each labeled document is solicited. In this paper, we present a novel active learning algorithm that selects a batch of Text documents for labeling manually in each iteration. The key of the batch mode active learning is how to reduce the redundancy among the selected examples such that each example provides unique information for model updating. To this end, we use the Fisher information matrix as the measurement of model uncertainty and choose the set of documents to effectively maximize the Fisher information of a classification model. Extensive experiments with three different datasets have shown that our algorithm is more effective than the state-of-the-art active learning techniques for Text Categorization and can be a promising tool toward large-scale Text Categorization for World Wide Web documents.

15 days free trial to Access Article

Enhong Chen - One of the best experts on this subject based on the ideXlab platform.

On the strength of hyperclique patterns for Text Categorization

Information Sciences, 2007

Co-Authors: Tieyun Qian, Hui Xiong, Yuanzhen Wang, Enhong Chen

Abstract:

The use of association patterns for Text Categorization has attracted great interest and a variety of useful methods have been developed. However, the key characteristics of pattern-based Text Categorization remain unclear. Indeed, there are still no concrete answers for the following two questions: Firstly, what kind of association pattern is the best candidate for pattern-based Text Categorization? Secondly, what is the most desirable way to use patterns for Text Categorization? In this paper, we focus on answering the above two questions. More specifically, we show that hyperclique patterns are more desirable than frequent patterns for Text Categorization. Along this line, we develop an algorithm for Text Categorization using hyperclique patterns. As demonstrated by our experimental results on various real-world Text documents, our method provides much better computational performance than state-of-the-art methods while retaining classification accuracy.

15 days free trial to Access Article
CIKM - Adapting association patterns for Text Categorization: weaknesses and enhancements

Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM '06, 2006

Co-Authors: Tieyun Qian, Hui Xiong, Yuanzhen Wang, Enhong Chen

Abstract:

The use of association patterns for Text Categorization has attracted great interest and a variety of useful methods have been developed. However, the key characteristics of pattern-based Text Categorization remain unclear. Indeed, there are still no concrete answers for the following two questions: First, what kind of association patterns are the best candidate for pattern-based Text Categorization? Second, what is the most desirable way to use patterns for Text Categorization? In this paper, we focus on answering the above two questions. Specifically, we show that hyperclique patterns are more desirable than frequent patterns for Text Categorization. Along this line, we develop an algorithm for Text Categorization using hyperclique patterns. The experimental results show that our method provides better performance than state-of-the-art methods in terms of both computational performance and classification accuracy.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Text Categorization with ideXlab!

Xian Jia-yang - One of the best experts on this subject based on the ideXlab platform.

Text Categorization Algorithm Based on Centroid

Qin Gang - One of the best experts on this subject based on the ideXlab platform.

Active Learning Based Text Categorization

Michael R Lyu - One of the best experts on this subject based on the ideXlab platform.

CIKM - Semi-supervised Text Categorization by active search

Large-scale Text Categorization by batch mode active learning

Rong Jin - One of the best experts on this subject based on the ideXlab platform.

CIKM - Semi-supervised Text Categorization by active search

Large-scale Text Categorization by batch mode active learning

Enhong Chen - One of the best experts on this subject based on the ideXlab platform.

On the strength of hyperclique patterns for Text Categorization

CIKM - Adapting association patterns for Text Categorization: weaknesses and enhancements