Textual Document

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 225 Experts worldwide ranked by ideXlab platform

Wai Lam - One of the best experts on this subject based on the ideXlab platform.

  • automatic Textual Document categorization based on generalized instance sets and a metamodel
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003
    Co-Authors: Wai Lam, Yiqiu Han
    Abstract:

    We propose a new approach to text categorization known as generalized instance set (GIS) algorithm under the framework of generalized instance patterns. Our GIS algorithm unifies the strengths of k-NN and linear classifiers and adapts to characteristics of text categorization problems. It focuses on refining the original instances and constructs a set of generalized instances. We also propose a metamodel framework based on category feature characteristics. It has a metalearning phase which discovers a relationship between category feature characteristics and each component algorithm. Extensive experiments have been conducted on two large-scale Document corpora for both GIS and the metamodel. The results demonstrate that both approaches generally achieve promising text categorization performance.

  • meta learning models for automatic Textual Document categorization
    Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2001
    Co-Authors: Kwokyin Lai, Wai Lam
    Abstract:

    We investigate two meta-model approaches for the task of automatic Textual Document categorization. The first approach is the linear combination approach. Based on the idea of distilling the characteristics of how we estimate the merits of each component algorithm, we propose three different strategies for the linear combination approach. The linear combination approach makes use of limited knowledge in the training Document set. To address this limitation, we propose the second meta-model approach, called Meta-learning Using Document Feature characteristics (MUDOF), which employs a meta-learning phase using Document feature characteristics. Document feature characteristics, derived from the training Document set, capture some inherent properties of a particular category. Extensive experiments have been conducted on a real-world Document collection and satisfactory performance is obtained.

  • automatic Textual Document categorization using multiple similarity based models
    SIAM International Conference on Data Mining, 2001
    Co-Authors: Kwokyin Lai, Wai Lam
    Abstract:

    We develop a similarity-based Textual Document categorization method called the generalized instance set (GIS) algorithm. GIS integrates the advantages of linear classifiers and k-nearest neighbour algorithm by generalization of selected instances. To further enhance the performance, we propose a meta-model framework which combines the strength of different variants of GIS algorithm as well as state-of-the-art existing algorithms using multivariate regression analysis on Document feature characteristics. Document feature characteristics, derived from the training Document set, capture some inherent properties of a particular category. Different from existing categorization methods, our proposed meta-model can automatically recommend a suitable algorithm for each category based on the category-specific statistical characteristics. In addition, our meta-model differs from existing multi-strategy learning in that our approach is not limited to the number and type of component classifiers. By flexible addition and substitution of different classifiers, incremental classification performance can be obtained. Extensive experiments have been conducted. The results confirm that our meta-model approach can exploit the advantage of its component algorithms, and demonstrate a better performance than existing algorithms. ∗Corresponding Author: Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong. {kylai@se.cuhk.edu.hk} †Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong. {wlam@se.cuhk.edu.hk}

  • modeling Textual Document classification
    Systems Man and Cybernetics, 1999
    Co-Authors: Wai Lam
    Abstract:

    We investigate existing rule-based techniques for automatic Textual Document classification. The weakness of these techniques are identified. We propose a new technique known as the IBRI algorithm by unifying the strengths of rule-based and instance-based methods. Our algorithm adapts to the characteristic of text classification problems. Some experiments have been conducted to demonstrate the effectiveness of our IBRI algorithm. Moreover, we compare the performance with an existing rule-based and instance-based algorithms. The results show that our IBRI performs better most of the time.

P Arnaudo - One of the best experts on this subject based on the ideXlab platform.

  • automated text classification using a dynamic artificial neural network model
    Expert Systems With Applications, 2012
    Co-Authors: M Ghiassi, M Olschimke, B Moon, P Arnaudo
    Abstract:

    Widespread digitization of information in today's internet age has intensified the need for effective Textual Document classification algorithms. Most real life classification problems, including text classification, genetic classification, medical classification, and others, are complex in nature and are characterized by high dimensionality. Current solution strategies include Naive Bayes (NB), Neural Network (NN), Linear Least Squares Fit (LLSF), k-Nearest-Neighbor (kNN), and Support Vector Machines (SVM); with SVMs showing better results in most cases. In this paper we introduce a new approach called dynamic architecture for artificial neural networks (DAN2) as an alternative for solving Textual Document classification problems. DAN2 is a scalable algorithm that does not require parameter settings or network architecture configuration. To show DAN2 as an effective and scalable alternative for text classification, we present comparative results for the Reuters-21578 benchmark dataset. Our results show DAN2 to perform very well against the current leading solutions (kNN and SVM) using established classification metrics.

Kwokyin Lai - One of the best experts on this subject based on the ideXlab platform.

  • meta learning models for automatic Textual Document categorization
    Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2001
    Co-Authors: Kwokyin Lai, Wai Lam
    Abstract:

    We investigate two meta-model approaches for the task of automatic Textual Document categorization. The first approach is the linear combination approach. Based on the idea of distilling the characteristics of how we estimate the merits of each component algorithm, we propose three different strategies for the linear combination approach. The linear combination approach makes use of limited knowledge in the training Document set. To address this limitation, we propose the second meta-model approach, called Meta-learning Using Document Feature characteristics (MUDOF), which employs a meta-learning phase using Document feature characteristics. Document feature characteristics, derived from the training Document set, capture some inherent properties of a particular category. Extensive experiments have been conducted on a real-world Document collection and satisfactory performance is obtained.

  • automatic Textual Document categorization using multiple similarity based models
    SIAM International Conference on Data Mining, 2001
    Co-Authors: Kwokyin Lai, Wai Lam
    Abstract:

    We develop a similarity-based Textual Document categorization method called the generalized instance set (GIS) algorithm. GIS integrates the advantages of linear classifiers and k-nearest neighbour algorithm by generalization of selected instances. To further enhance the performance, we propose a meta-model framework which combines the strength of different variants of GIS algorithm as well as state-of-the-art existing algorithms using multivariate regression analysis on Document feature characteristics. Document feature characteristics, derived from the training Document set, capture some inherent properties of a particular category. Different from existing categorization methods, our proposed meta-model can automatically recommend a suitable algorithm for each category based on the category-specific statistical characteristics. In addition, our meta-model differs from existing multi-strategy learning in that our approach is not limited to the number and type of component classifiers. By flexible addition and substitution of different classifiers, incremental classification performance can be obtained. Extensive experiments have been conducted. The results confirm that our meta-model approach can exploit the advantage of its component algorithms, and demonstrate a better performance than existing algorithms. ∗Corresponding Author: Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong. {kylai@se.cuhk.edu.hk} †Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong. {wlam@se.cuhk.edu.hk}

M Ghiassi - One of the best experts on this subject based on the ideXlab platform.

  • automated text classification using a dynamic artificial neural network model
    Expert Systems With Applications, 2012
    Co-Authors: M Ghiassi, M Olschimke, B Moon, P Arnaudo
    Abstract:

    Widespread digitization of information in today's internet age has intensified the need for effective Textual Document classification algorithms. Most real life classification problems, including text classification, genetic classification, medical classification, and others, are complex in nature and are characterized by high dimensionality. Current solution strategies include Naive Bayes (NB), Neural Network (NN), Linear Least Squares Fit (LLSF), k-Nearest-Neighbor (kNN), and Support Vector Machines (SVM); with SVMs showing better results in most cases. In this paper we introduce a new approach called dynamic architecture for artificial neural networks (DAN2) as an alternative for solving Textual Document classification problems. DAN2 is a scalable algorithm that does not require parameter settings or network architecture configuration. To show DAN2 as an effective and scalable alternative for text classification, we present comparative results for the Reuters-21578 benchmark dataset. Our results show DAN2 to perform very well against the current leading solutions (kNN and SVM) using established classification metrics.

Wang Wenjie - One of the best experts on this subject based on the ideXlab platform.

  • validation of Textual Document clustering techniques
    Computer Engineering, 2007
    Co-Authors: Wang Wenjie
    Abstract:

    This paper presents the quality evaluation criterions.Based on these criterions it takes three Document clustering algorithms for assessment with experiments.The comparison and analysis show that STC(Suffix Tree Clustering) algorithm is better than k-Means and Ant-based clustering algorithms.The better performance of STC algorithm comes from that it takes accounts of the linguistic property when processing the Documents.Ant-based clustering algorithm’s performance variation is affected by the input variables.It is necessary to adopt linguistic properties to improve the Ant-based text clustering’s performance.