The Experts below are selected from a list of 23283 Experts worldwide ranked by ideXlab platform
Yong Yu - One of the best experts on this subject based on the ideXlab platform.
-
topic bridged plsa for cross domain Text Classification
International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008Co-Authors: Qiang Yang, Yong YuAbstract:In many Web applications, such as blog Classification and new-sgroup Classification, labeled data are in short supply. It often happens that obtaining labeled data in a new domain is expensive and time consuming, while there may be plenty of labeled data in a related but different domain. Traditional Text Classification ap-proaches are not able to cope well with learning across different domains. In this paper, we propose a novel cross-domain Text Classification algorithm which extends the traditional probabilistic latent semantic analysis (PLSA) algorithm to integrate labeled and unlabeled data, which come from different but related domains, into a unified probabilistic model. We call this new model Topic-bridged PLSA, or TPLSA. By exploiting the common topics between two domains, we transfer knowledge across different domains through a topic-bridge to help the Text Classification in the target domain. A unique advantage of our method is its ability to maximally mine knowledge that can be transferred between domains, resulting in superior performance when compared to other state-of-the-art Text Classification approaches. Experimental eval-uation on different kinds of datasets shows that our proposed algorithm can improve the performance of cross-domain Text Classification significantly.
Chao Zhang - One of the best experts on this subject based on the ideXlab platform.
-
weakly supervised hierarchical Text Classification
National Conference on Artificial Intelligence, 2019Co-Authors: Yu Meng, Jiaming Shen, Chao ZhangAbstract:Hierarchical Text Classification, which aims to classify Text documents into a given hierarchy, is an important task in many real-world applications. Recently, deep neural models are gaining increasing popularity for Text Classification due to their expressive power and minimum requirement for feature engineering. However, applying deep neural networks for hierarchical Text Classification remains challenging, because they heavily rely on a large amount of training data and meanwhile cannot easily determine appropriate levels of documents in the hierarchical setting. In this paper, we propose a weakly-supervised neural method for hierarchical Text Classification. Our method does not require a large amount of training data but requires only easy-to-provide weak supervision signals such as a few class-related documents or keywords. Our method effectively leverages such weak supervision signals to generate pseudo documents for model pre-training, and then performs self-training on real unlabeled data to iteratively refine the model. During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism. Experiments on three datasets from different domains demonstrate the efficacy of our method compared with a comprehensive set of baselines.
-
CIKM - Weakly-Supervised Neural Text Classification
Proceedings of the 27th ACM International Conference on Information and Knowledge Management - CIKM '18, 2018Co-Authors: Yu Meng, Jiaming Shen, Chao ZhangAbstract:Deep neural networks are gaining increasing popularity for the classic Text Classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural Text Classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised Text Classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural Text Classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for Text Classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.
Qiang Yang - One of the best experts on this subject based on the ideXlab platform.
-
topic bridged plsa for cross domain Text Classification
International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008Co-Authors: Qiang Yang, Yong YuAbstract:In many Web applications, such as blog Classification and new-sgroup Classification, labeled data are in short supply. It often happens that obtaining labeled data in a new domain is expensive and time consuming, while there may be plenty of labeled data in a related but different domain. Traditional Text Classification ap-proaches are not able to cope well with learning across different domains. In this paper, we propose a novel cross-domain Text Classification algorithm which extends the traditional probabilistic latent semantic analysis (PLSA) algorithm to integrate labeled and unlabeled data, which come from different but related domains, into a unified probabilistic model. We call this new model Topic-bridged PLSA, or TPLSA. By exploiting the common topics between two domains, we transfer knowledge across different domains through a topic-bridge to help the Text Classification in the target domain. A unique advantage of our method is its ability to maximally mine knowledge that can be transferred between domains, resulting in superior performance when compared to other state-of-the-art Text Classification approaches. Experimental eval-uation on different kinds of datasets shows that our proposed algorithm can improve the performance of cross-domain Text Classification significantly.
Rasha Elhassan - One of the best experts on this subject based on the ideXlab platform.
-
Arabic Text Classification Process
2017Co-Authors: Rasha ElhassanAbstract:Due to the richness of the language, the ordinary Arabic Text Classification of the content is very complex, difficult and challenging task. This resulted as unavailable benchmarking Arabic corpus, limitation of research and ambiguous processing phases in the field of Arabic Text Classification, This paper presents the complex nature of Arabic language, poses the problems of lacking free public Arabic corpora, explain the Classification phases throw the literature Arabic Text Classification
-
Arabic Text Classification review
2015Co-Authors: Mahmoud Ahmed, Rasha ElhassanAbstract:A millions of the documents are available free and online. These documents must be first organized systematically for its proper utilization to make a decision from it. There are a lot of applications that help in organizing the documents. Text Classification is deal with how the document belongs to its suitable class or category. Arabic language is richness and a very complex inflectional language which makes ordinary analysis a very complex task. This paper focuses on the published research in the field of Arabic Text Classification and presents a scientific view about the process of it and camper the evaluation of Text Classification techniques that were used.
Yu Meng - One of the best experts on this subject based on the ideXlab platform.
-
weakly supervised hierarchical Text Classification
National Conference on Artificial Intelligence, 2019Co-Authors: Yu Meng, Jiaming Shen, Chao ZhangAbstract:Hierarchical Text Classification, which aims to classify Text documents into a given hierarchy, is an important task in many real-world applications. Recently, deep neural models are gaining increasing popularity for Text Classification due to their expressive power and minimum requirement for feature engineering. However, applying deep neural networks for hierarchical Text Classification remains challenging, because they heavily rely on a large amount of training data and meanwhile cannot easily determine appropriate levels of documents in the hierarchical setting. In this paper, we propose a weakly-supervised neural method for hierarchical Text Classification. Our method does not require a large amount of training data but requires only easy-to-provide weak supervision signals such as a few class-related documents or keywords. Our method effectively leverages such weak supervision signals to generate pseudo documents for model pre-training, and then performs self-training on real unlabeled data to iteratively refine the model. During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism. Experiments on three datasets from different domains demonstrate the efficacy of our method compared with a comprehensive set of baselines.
-
CIKM - Weakly-Supervised Neural Text Classification
Proceedings of the 27th ACM International Conference on Information and Knowledge Management - CIKM '18, 2018Co-Authors: Yu Meng, Jiaming Shen, Chao ZhangAbstract:Deep neural networks are gaining increasing popularity for the classic Text Classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural Text Classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised Text Classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural Text Classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for Text Classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.