Text Classification

The Experts below are selected from a list of 23283 Experts worldwide ranked by ideXlab platform

Yong Yu - One of the best experts on this subject based on the ideXlab platform.

topic bridged plsa for cross domain Text Classification

International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008

Co-Authors: Qiang Yang, Yong Yu

Abstract:

In many Web applications, such as blog Classification and new-sgroup Classification, labeled data are in short supply. It often happens that obtaining labeled data in a new domain is expensive and time consuming, while there may be plenty of labeled data in a related but different domain. Traditional Text Classification ap-proaches are not able to cope well with learning across different domains. In this paper, we propose a novel cross-domain Text Classification algorithm which extends the traditional probabilistic latent semantic analysis (PLSA) algorithm to integrate labeled and unlabeled data, which come from different but related domains, into a unified probabilistic model. We call this new model Topic-bridged PLSA, or TPLSA. By exploiting the common topics between two domains, we transfer knowledge across different domains through a topic-bridge to help the Text Classification in the target domain. A unique advantage of our method is its ability to maximally mine knowledge that can be transferred between domains, resulting in superior performance when compared to other state-of-the-art Text Classification approaches. Experimental eval-uation on different kinds of datasets shows that our proposed algorithm can improve the performance of cross-domain Text Classification significantly.

15 days free trial to Access Article

Chao Zhang - One of the best experts on this subject based on the ideXlab platform.

weakly supervised hierarchical Text Classification

National Conference on Artificial Intelligence, 2019

Co-Authors: Yu Meng, Jiaming Shen, Chao Zhang

Abstract:

Hierarchical Text Classification, which aims to classify Text documents into a given hierarchy, is an important task in many real-world applications. Recently, deep neural models are gaining increasing popularity for Text Classification due to their expressive power and minimum requirement for feature engineering. However, applying deep neural networks for hierarchical Text Classification remains challenging, because they heavily rely on a large amount of training data and meanwhile cannot easily determine appropriate levels of documents in the hierarchical setting. In this paper, we propose a weakly-supervised neural method for hierarchical Text Classification. Our method does not require a large amount of training data but requires only easy-to-provide weak supervision signals such as a few class-related documents or keywords. Our method effectively leverages such weak supervision signals to generate pseudo documents for model pre-training, and then performs self-training on real unlabeled data to iteratively refine the model. During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism. Experiments on three datasets from different domains demonstrate the efficacy of our method compared with a comprehensive set of baselines.

15 days free trial to Access Article
CIKM - Weakly-Supervised Neural Text Classification

Proceedings of the 27th ACM International Conference on Information and Knowledge Management - CIKM '18, 2018

Co-Authors: Yu Meng, Jiaming Shen, Chao Zhang

Abstract:

Deep neural networks are gaining increasing popularity for the classic Text Classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural Text Classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised Text Classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural Text Classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for Text Classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.

15 days free trial to Access Article

Qiang Yang - One of the best experts on this subject based on the ideXlab platform.

topic bridged plsa for cross domain Text Classification

International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008

Co-Authors: Qiang Yang, Yong Yu

Abstract:

In many Web applications, such as blog Classification and new-sgroup Classification, labeled data are in short supply. It often happens that obtaining labeled data in a new domain is expensive and time consuming, while there may be plenty of labeled data in a related but different domain. Traditional Text Classification ap-proaches are not able to cope well with learning across different domains. In this paper, we propose a novel cross-domain Text Classification algorithm which extends the traditional probabilistic latent semantic analysis (PLSA) algorithm to integrate labeled and unlabeled data, which come from different but related domains, into a unified probabilistic model. We call this new model Topic-bridged PLSA, or TPLSA. By exploiting the common topics between two domains, we transfer knowledge across different domains through a topic-bridge to help the Text Classification in the target domain. A unique advantage of our method is its ability to maximally mine knowledge that can be transferred between domains, resulting in superior performance when compared to other state-of-the-art Text Classification approaches. Experimental eval-uation on different kinds of datasets shows that our proposed algorithm can improve the performance of cross-domain Text Classification significantly.

15 days free trial to Access Article

Rasha Elhassan - One of the best experts on this subject based on the ideXlab platform.

Arabic Text Classification Process

2017

Co-Authors: Rasha Elhassan

Abstract:

Due to the richness of the language, the ordinary Arabic Text Classification of the content is very complex, difficult and challenging task. This resulted as unavailable benchmarking Arabic corpus, limitation of research and ambiguous processing phases in the field of Arabic Text Classification, This paper presents the complex nature of Arabic language, poses the problems of lacking free public Arabic corpora, explain the Classification phases throw the literature Arabic Text Classification

15 days free trial to Access Article
Arabic Text Classification review

2015

Co-Authors: Mahmoud Ahmed, Rasha Elhassan

Abstract:

A millions of the documents are available free and online. These documents must be first organized systematically for its proper utilization to make a decision from it. There are a lot of applications that help in organizing the documents. Text Classification is deal with how the document belongs to its suitable class or category. Arabic language is richness and a very complex inflectional language which makes ordinary analysis a very complex task. This paper focuses on the published research in the field of Arabic Text Classification and presents a scientific view about the process of it and camper the evaluation of Text Classification techniques that were used.

15 days free trial to Access Article

Yu Meng - One of the best experts on this subject based on the ideXlab platform.

weakly supervised hierarchical Text Classification

National Conference on Artificial Intelligence, 2019

Co-Authors: Yu Meng, Jiaming Shen, Chao Zhang

Abstract:

Hierarchical Text Classification, which aims to classify Text documents into a given hierarchy, is an important task in many real-world applications. Recently, deep neural models are gaining increasing popularity for Text Classification due to their expressive power and minimum requirement for feature engineering. However, applying deep neural networks for hierarchical Text Classification remains challenging, because they heavily rely on a large amount of training data and meanwhile cannot easily determine appropriate levels of documents in the hierarchical setting. In this paper, we propose a weakly-supervised neural method for hierarchical Text Classification. Our method does not require a large amount of training data but requires only easy-to-provide weak supervision signals such as a few class-related documents or keywords. Our method effectively leverages such weak supervision signals to generate pseudo documents for model pre-training, and then performs self-training on real unlabeled data to iteratively refine the model. During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism. Experiments on three datasets from different domains demonstrate the efficacy of our method compared with a comprehensive set of baselines.

15 days free trial to Access Article
CIKM - Weakly-Supervised Neural Text Classification

Proceedings of the 27th ACM International Conference on Information and Knowledge Management - CIKM '18, 2018

Co-Authors: Yu Meng, Jiaming Shen, Chao Zhang

Abstract:

Deep neural networks are gaining increasing popularity for the classic Text Classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural Text Classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised Text Classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural Text Classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for Text Classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Yong Yu - One of the best experts on this subject based on the ideXlab platform.

topic bridged plsa for cross domain Text Classification

Chao Zhang - One of the best experts on this subject based on the ideXlab platform.

weakly supervised hierarchical Text Classification

CIKM - Weakly-Supervised Neural Text Classification

Qiang Yang - One of the best experts on this subject based on the ideXlab platform.

topic bridged plsa for cross domain Text Classification

Rasha Elhassan - One of the best experts on this subject based on the ideXlab platform.

Arabic Text Classification Process

Arabic Text Classification review

Yu Meng - One of the best experts on this subject based on the ideXlab platform.

weakly supervised hierarchical Text Classification

CIKM - Weakly-Supervised Neural Text Classification

Text Classification

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Related terms

Yong Yu - One of the best experts on this subject based on the ideXlab platform.

Chao Zhang - One of the best experts on this subject based on the ideXlab platform.

Qiang Yang - One of the best experts on this subject based on the ideXlab platform.

Rasha Elhassan - One of the best experts on this subject based on the ideXlab platform.

Yu Meng - One of the best experts on this subject based on the ideXlab platform.

Related terms