Text Clustering - Explore the Science & Experts

The Experts below are selected from a list of 24255 Experts worldwide ranked by ideXlab platform

Li Xiang - One of the best experts on this subject based on the ideXlab platform.

VSM-based Text Clustering Algorithm

Computer Engineering, 2008

Co-Authors: Li Xiang

Abstract:

Text Clustering, one of the most important research braches of Clustering, is the application of Clustering algorithm in Text processing.This paper discusses different Vector Space Model(VSM)-based Clustering algorithms and presents an improved Text Clustering algorithm——Level-Panel(LP) algorithm.In addition, according to the effects of Clustering for the corpus, it presents optimizations of Clustering algorithm, including dimension determining, feature selection, etc.It is proved that LP algorithm can effectively reduce the time spending in Clustering process.It is high in practicability and flexibility.

15 days free trial to Access Article

Zhang Wansha - One of the best experts on this subject based on the ideXlab platform.

Web Text Clustering method based on topic

Journal of Computer Applications, 2014

Co-Authors: Zhang Wansha

Abstract:

Concerning that the traditional Web Text Clustering algorithm without considering the Web Text topic information leads to a low accuracy rate of multi-topic Web Text Clustering, a new algorithm was proposed for Web Text Clustering based on the topic theme. In the method, multi-topic Web Text was clustered by three steps: topic extraction, feature extraction and Text Clustering. Compared to the traditional Web Text Clustering algorithm, the proposed method fully considered the Web Text topic information. The experimental results show that the accuracy rate of the proposed algorithm for multi-topic Web Text Clustering is higher than the Text Clustering method based on K-means or HowNet.

15 days free trial to Access Article

Xiaowei Xu - One of the best experts on this subject based on the ideXlab platform.

Frequent term-based Text Clustering

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02, 2002

Co-Authors: Florian Beil, Martin Ester, Xiaowei Xu

Abstract:

Text Clustering methods can be used to structure large sets of Text or hyperText documents. The well-known methods of Text Clustering, however, do not really address the special problems of Text Clustering: very high dimensionality of the data, very large size of the databases and understandability of the cluster description. In this paper, we introduce a novel approach which uses frequent item (term) sets for Text Clustering. Such frequent sets can be efficiently discovered using algorithms for association rule mining. To cluster based on frequent term sets, we measure the mutual overlap of frequent sets with respect to the sets of supporting documents. We present two algorithms for frequent term-based Text Clustering, FTC which creates flat Clusterings and HFTC for hierarchical Clustering. An experimental evaluation on classical Text documents as well as on web documents demonstrates that the proposed algorithms obtain Clusterings of comparable quality significantly more efficiently than state-of-the- art Text Clustering algorithms. Furthermore, our methods provide an understandable description of the discovered clusters by their frequent term sets.

15 days free trial to Access Article

Zhou Xianzhong - One of the best experts on this subject based on the ideXlab platform.

A Novel Text Clustering Algorithm Based on Niching Technique

Computer Engineering, 2006

Co-Authors: Zhou Xianzhong

Abstract:

This paper presents an unsupervised robust Text Clustering method based on niching genetic algorithm in which Text Clustering in feature space is transformed into a multimodal function optimization problem within the conText of genetic niching.The peaks of multimodal function,which constitute the final Text Clustering centers,are identified based on improved deterministic crowding.Fitness function is constructed in terms of density estimator of data points.Niching radius can be dynamically adjusted by using an iterative hill-climbing method coupling with genetic optimization of the Text cluster centers.As a result,the number of Text clusters can be adaptively obtained.The experimental results show that the algorithm is effective and efficient in dealing with the problem of Text Clustering.

15 days free trial to Access Article

Florian Beil - One of the best experts on this subject based on the ideXlab platform.

Frequent term-based Text Clustering

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02, 2002

Co-Authors: Florian Beil, Martin Ester, Xiaowei Xu

Abstract:

Text Clustering methods can be used to structure large sets of Text or hyperText documents. The well-known methods of Text Clustering, however, do not really address the special problems of Text Clustering: very high dimensionality of the data, very large size of the databases and understandability of the cluster description. In this paper, we introduce a novel approach which uses frequent item (term) sets for Text Clustering. Such frequent sets can be efficiently discovered using algorithms for association rule mining. To cluster based on frequent term sets, we measure the mutual overlap of frequent sets with respect to the sets of supporting documents. We present two algorithms for frequent term-based Text Clustering, FTC which creates flat Clusterings and HFTC for hierarchical Clustering. An experimental evaluation on classical Text documents as well as on web documents demonstrates that the proposed algorithms obtain Clusterings of comparable quality significantly more efficiently than state-of-the- art Text Clustering algorithms. Furthermore, our methods provide an understandable description of the discovered clusters by their frequent term sets.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Text Clustering with ideXlab!

Li Xiang - One of the best experts on this subject based on the ideXlab platform.

VSM-based Text Clustering Algorithm

Zhang Wansha - One of the best experts on this subject based on the ideXlab platform.

Web Text Clustering method based on topic

Xiaowei Xu - One of the best experts on this subject based on the ideXlab platform.

Frequent term-based Text Clustering

Zhou Xianzhong - One of the best experts on this subject based on the ideXlab platform.

A Novel Text Clustering Algorithm Based on Niching Technique

Florian Beil - One of the best experts on this subject based on the ideXlab platform.

Frequent term-based Text Clustering