Text Clustering

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 24255 Experts worldwide ranked by ideXlab platform

Li Xiang - One of the best experts on this subject based on the ideXlab platform.

  • VSM-based Text Clustering Algorithm
    Computer Engineering, 2008
    Co-Authors: Li Xiang
    Abstract:

    Text Clustering, one of the most important research braches of Clustering, is the application of Clustering algorithm in Text processing.This paper discusses different Vector Space Model(VSM)-based Clustering algorithms and presents an improved Text Clustering algorithm——Level-Panel(LP) algorithm.In addition, according to the effects of Clustering for the corpus, it presents optimizations of Clustering algorithm, including dimension determining, feature selection, etc.It is proved that LP algorithm can effectively reduce the time spending in Clustering process.It is high in practicability and flexibility.

Zhang Wansha - One of the best experts on this subject based on the ideXlab platform.

  • Web Text Clustering method based on topic
    Journal of Computer Applications, 2014
    Co-Authors: Zhang Wansha
    Abstract:

    Concerning that the traditional Web Text Clustering algorithm without considering the Web Text topic information leads to a low accuracy rate of multi-topic Web Text Clustering, a new algorithm was proposed for Web Text Clustering based on the topic theme. In the method, multi-topic Web Text was clustered by three steps: topic extraction, feature extraction and Text Clustering. Compared to the traditional Web Text Clustering algorithm, the proposed method fully considered the Web Text topic information. The experimental results show that the accuracy rate of the proposed algorithm for multi-topic Web Text Clustering is higher than the Text Clustering method based on K-means or HowNet.

Xiaowei Xu - One of the best experts on this subject based on the ideXlab platform.

  • Frequent term-based Text Clustering
    Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02, 2002
    Co-Authors: Florian Beil, Martin Ester, Xiaowei Xu
    Abstract:

    Text Clustering methods can be used to structure large sets of Text or hyperText documents. The well-known methods of Text Clustering, however, do not really address the special problems of Text Clustering: very high dimensionality of the data, very large size of the databases and understandability of the cluster description. In this paper, we introduce a novel approach which uses frequent item (term) sets for Text Clustering. Such frequent sets can be efficiently discovered using algorithms for association rule mining. To cluster based on frequent term sets, we measure the mutual overlap of frequent sets with respect to the sets of supporting documents. We present two algorithms for frequent term-based Text Clustering, FTC which creates flat Clusterings and HFTC for hierarchical Clustering. An experimental evaluation on classical Text documents as well as on web documents demonstrates that the proposed algorithms obtain Clusterings of comparable quality significantly more efficiently than state-of-the- art Text Clustering algorithms. Furthermore, our methods provide an understandable description of the discovered clusters by their frequent term sets.

Zhou Xianzhong - One of the best experts on this subject based on the ideXlab platform.

  • A Novel Text Clustering Algorithm Based on Niching Technique
    Computer Engineering, 2006
    Co-Authors: Zhou Xianzhong
    Abstract:

    This paper presents an unsupervised robust Text Clustering method based on niching genetic algorithm in which Text Clustering in feature space is transformed into a multimodal function optimization problem within the conText of genetic niching.The peaks of multimodal function,which constitute the final Text Clustering centers,are identified based on improved deterministic crowding.Fitness function is constructed in terms of density estimator of data points.Niching radius can be dynamically adjusted by using an iterative hill-climbing method coupling with genetic optimization of the Text cluster centers.As a result,the number of Text clusters can be adaptively obtained.The experimental results show that the algorithm is effective and efficient in dealing with the problem of Text Clustering.

Florian Beil - One of the best experts on this subject based on the ideXlab platform.

  • Frequent term-based Text Clustering
    Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02, 2002
    Co-Authors: Florian Beil, Martin Ester, Xiaowei Xu
    Abstract:

    Text Clustering methods can be used to structure large sets of Text or hyperText documents. The well-known methods of Text Clustering, however, do not really address the special problems of Text Clustering: very high dimensionality of the data, very large size of the databases and understandability of the cluster description. In this paper, we introduce a novel approach which uses frequent item (term) sets for Text Clustering. Such frequent sets can be efficiently discovered using algorithms for association rule mining. To cluster based on frequent term sets, we measure the mutual overlap of frequent sets with respect to the sets of supporting documents. We present two algorithms for frequent term-based Text Clustering, FTC which creates flat Clusterings and HFTC for hierarchical Clustering. An experimental evaluation on classical Text documents as well as on web documents demonstrates that the proposed algorithms obtain Clusterings of comparable quality significantly more efficiently than state-of-the- art Text Clustering algorithms. Furthermore, our methods provide an understandable description of the discovered clusters by their frequent term sets.