Textual Document - Explore the Science & Experts

The Experts below are selected from a list of 225 Experts worldwide ranked by ideXlab platform

Wai Lam - One of the best experts on this subject based on the ideXlab platform.

automatic Textual Document categorization based on generalized instance sets and a metamodel

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003

Co-Authors: Wai Lam, Yiqiu Han

Abstract:

We propose a new approach to text categorization known as generalized instance set (GIS) algorithm under the framework of generalized instance patterns. Our GIS algorithm unifies the strengths of k-NN and linear classifiers and adapts to characteristics of text categorization problems. It focuses on refining the original instances and constructs a set of generalized instances. We also propose a metamodel framework based on category feature characteristics. It has a metalearning phase which discovers a relationship between category feature characteristics and each component algorithm. Extensive experiments have been conducted on two large-scale Document corpora for both GIS and the metamodel. The results demonstrate that both approaches generally achieve promising text categorization performance.

15 days free trial to Access Article
meta learning models for automatic Textual Document categorization

Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2001

Co-Authors: Kwokyin Lai, Wai Lam

Abstract:

We investigate two meta-model approaches for the task of automatic Textual Document categorization. The first approach is the linear combination approach. Based on the idea of distilling the characteristics of how we estimate the merits of each component algorithm, we propose three different strategies for the linear combination approach. The linear combination approach makes use of limited knowledge in the training Document set. To address this limitation, we propose the second meta-model approach, called Meta-learning Using Document Feature characteristics (MUDOF), which employs a meta-learning phase using Document feature characteristics. Document feature characteristics, derived from the training Document set, capture some inherent properties of a particular category. Extensive experiments have been conducted on a real-world Document collection and satisfactory performance is obtained.

15 days free trial to Access Article
automatic Textual Document categorization using multiple similarity based models

SIAM International Conference on Data Mining, 2001

Co-Authors: Kwokyin Lai, Wai Lam

Abstract:

We develop a similarity-based Textual Document categorization method called the generalized instance set (GIS) algorithm. GIS integrates the advantages of linear classifiers and k-nearest neighbour algorithm by generalization of selected instances. To further enhance the performance, we propose a meta-model framework which combines the strength of different variants of GIS algorithm as well as state-of-the-art existing algorithms using multivariate regression analysis on Document feature characteristics. Document feature characteristics, derived from the training Document set, capture some inherent properties of a particular category. Different from existing categorization methods, our proposed meta-model can automatically recommend a suitable algorithm for each category based on the category-specific statistical characteristics. In addition, our meta-model differs from existing multi-strategy learning in that our approach is not limited to the number and type of component classifiers. By flexible addition and substitution of different classifiers, incremental classification performance can be obtained. Extensive experiments have been conducted. The results confirm that our meta-model approach can exploit the advantage of its component algorithms, and demonstrate a better performance than existing algorithms. ∗Corresponding Author: Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong. {kylai@se.cuhk.edu.hk} †Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong. {wlam@se.cuhk.edu.hk}

15 days free trial to Access Article
modeling Textual Document classification

Systems Man and Cybernetics, 1999

Co-Authors: Wai Lam

Abstract:

We investigate existing rule-based techniques for automatic Textual Document classification. The weakness of these techniques are identified. We propose a new technique known as the IBRI algorithm by unifying the strengths of rule-based and instance-based methods. Our algorithm adapts to the characteristic of text classification problems. Some experiments have been conducted to demonstrate the effectiveness of our IBRI algorithm. Moreover, we compare the performance with an existing rule-based and instance-based algorithms. The results show that our IBRI performs better most of the time.

15 days free trial to Access Article

P Arnaudo - One of the best experts on this subject based on the ideXlab platform.

automated text classification using a dynamic artificial neural network model

Expert Systems With Applications, 2012

Co-Authors: M Ghiassi, M Olschimke, B Moon, P Arnaudo

Abstract:

Widespread digitization of information in today's internet age has intensified the need for effective Textual Document classification algorithms. Most real life classification problems, including text classification, genetic classification, medical classification, and others, are complex in nature and are characterized by high dimensionality. Current solution strategies include Naive Bayes (NB), Neural Network (NN), Linear Least Squares Fit (LLSF), k-Nearest-Neighbor (kNN), and Support Vector Machines (SVM); with SVMs showing better results in most cases. In this paper we introduce a new approach called dynamic architecture for artificial neural networks (DAN2) as an alternative for solving Textual Document classification problems. DAN2 is a scalable algorithm that does not require parameter settings or network architecture configuration. To show DAN2 as an effective and scalable alternative for text classification, we present comparative results for the Reuters-21578 benchmark dataset. Our results show DAN2 to perform very well against the current leading solutions (kNN and SVM) using established classification metrics.

15 days free trial to Access Article

Kwokyin Lai - One of the best experts on this subject based on the ideXlab platform.

meta learning models for automatic Textual Document categorization

Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2001

Co-Authors: Kwokyin Lai, Wai Lam

Abstract:

We investigate two meta-model approaches for the task of automatic Textual Document categorization. The first approach is the linear combination approach. Based on the idea of distilling the characteristics of how we estimate the merits of each component algorithm, we propose three different strategies for the linear combination approach. The linear combination approach makes use of limited knowledge in the training Document set. To address this limitation, we propose the second meta-model approach, called Meta-learning Using Document Feature characteristics (MUDOF), which employs a meta-learning phase using Document feature characteristics. Document feature characteristics, derived from the training Document set, capture some inherent properties of a particular category. Extensive experiments have been conducted on a real-world Document collection and satisfactory performance is obtained.

15 days free trial to Access Article
automatic Textual Document categorization using multiple similarity based models

SIAM International Conference on Data Mining, 2001

Co-Authors: Kwokyin Lai, Wai Lam

Abstract:

We develop a similarity-based Textual Document categorization method called the generalized instance set (GIS) algorithm. GIS integrates the advantages of linear classifiers and k-nearest neighbour algorithm by generalization of selected instances. To further enhance the performance, we propose a meta-model framework which combines the strength of different variants of GIS algorithm as well as state-of-the-art existing algorithms using multivariate regression analysis on Document feature characteristics. Document feature characteristics, derived from the training Document set, capture some inherent properties of a particular category. Different from existing categorization methods, our proposed meta-model can automatically recommend a suitable algorithm for each category based on the category-specific statistical characteristics. In addition, our meta-model differs from existing multi-strategy learning in that our approach is not limited to the number and type of component classifiers. By flexible addition and substitution of different classifiers, incremental classification performance can be obtained. Extensive experiments have been conducted. The results confirm that our meta-model approach can exploit the advantage of its component algorithms, and demonstrate a better performance than existing algorithms. ∗Corresponding Author: Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong. {kylai@se.cuhk.edu.hk} †Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong. {wlam@se.cuhk.edu.hk}

15 days free trial to Access Article

M Ghiassi - One of the best experts on this subject based on the ideXlab platform.

automated text classification using a dynamic artificial neural network model

Expert Systems With Applications, 2012

Co-Authors: M Ghiassi, M Olschimke, B Moon, P Arnaudo

Abstract:

Widespread digitization of information in today's internet age has intensified the need for effective Textual Document classification algorithms. Most real life classification problems, including text classification, genetic classification, medical classification, and others, are complex in nature and are characterized by high dimensionality. Current solution strategies include Naive Bayes (NB), Neural Network (NN), Linear Least Squares Fit (LLSF), k-Nearest-Neighbor (kNN), and Support Vector Machines (SVM); with SVMs showing better results in most cases. In this paper we introduce a new approach called dynamic architecture for artificial neural networks (DAN2) as an alternative for solving Textual Document classification problems. DAN2 is a scalable algorithm that does not require parameter settings or network architecture configuration. To show DAN2 as an effective and scalable alternative for text classification, we present comparative results for the Reuters-21578 benchmark dataset. Our results show DAN2 to perform very well against the current leading solutions (kNN and SVM) using established classification metrics.

15 days free trial to Access Article

Wang Wenjie - One of the best experts on this subject based on the ideXlab platform.

validation of Textual Document clustering techniques

Computer Engineering, 2007

Co-Authors: Wang Wenjie

Abstract:

This paper presents the quality evaluation criterions.Based on these criterions it takes three Document clustering algorithms for assessment with experiments.The comparison and analysis show that STC(Suffix Tree Clustering) algorithm is better than k-Means and Ant-based clustering algorithms.The better performance of STC algorithm comes from that it takes accounts of the linguistic property when processing the Documents.Ant-based clustering algorithm’s performance variation is affected by the input variables.It is necessary to adopt linguistic properties to improve the Ant-based text clustering’s performance.

15 days free trial to Access Article

Discover everything there is to know about the scientific topic Textual Document with ideXlab!

Wai Lam - One of the best experts on this subject based on the ideXlab platform.

automatic Textual Document categorization based on generalized instance sets and a metamodel

meta learning models for automatic Textual Document categorization

automatic Textual Document categorization using multiple similarity based models

modeling Textual Document classification

P Arnaudo - One of the best experts on this subject based on the ideXlab platform.

automated text classification using a dynamic artificial neural network model

Kwokyin Lai - One of the best experts on this subject based on the ideXlab platform.

meta learning models for automatic Textual Document categorization

automatic Textual Document categorization using multiple similarity based models

M Ghiassi - One of the best experts on this subject based on the ideXlab platform.

automated text classification using a dynamic artificial neural network model

Wang Wenjie - One of the best experts on this subject based on the ideXlab platform.

validation of Textual Document clustering techniques