Text Classification

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 23283 Experts worldwide ranked by ideXlab platform

Yong Yu - One of the best experts on this subject based on the ideXlab platform.

  • topic bridged plsa for cross domain Text Classification
    International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008
    Co-Authors: Qiang Yang, Yong Yu
    Abstract:

    In many Web applications, such as blog Classification and new-sgroup Classification, labeled data are in short supply. It often happens that obtaining labeled data in a new domain is expensive and time consuming, while there may be plenty of labeled data in a related but different domain. Traditional Text Classification ap-proaches are not able to cope well with learning across different domains. In this paper, we propose a novel cross-domain Text Classification algorithm which extends the traditional probabilistic latent semantic analysis (PLSA) algorithm to integrate labeled and unlabeled data, which come from different but related domains, into a unified probabilistic model. We call this new model Topic-bridged PLSA, or TPLSA. By exploiting the common topics between two domains, we transfer knowledge across different domains through a topic-bridge to help the Text Classification in the target domain. A unique advantage of our method is its ability to maximally mine knowledge that can be transferred between domains, resulting in superior performance when compared to other state-of-the-art Text Classification approaches. Experimental eval-uation on different kinds of datasets shows that our proposed algorithm can improve the performance of cross-domain Text Classification significantly.

Chao Zhang - One of the best experts on this subject based on the ideXlab platform.

  • weakly supervised hierarchical Text Classification
    National Conference on Artificial Intelligence, 2019
    Co-Authors: Yu Meng, Jiaming Shen, Chao Zhang
    Abstract:

    Hierarchical Text Classification, which aims to classify Text documents into a given hierarchy, is an important task in many real-world applications. Recently, deep neural models are gaining increasing popularity for Text Classification due to their expressive power and minimum requirement for feature engineering. However, applying deep neural networks for hierarchical Text Classification remains challenging, because they heavily rely on a large amount of training data and meanwhile cannot easily determine appropriate levels of documents in the hierarchical setting. In this paper, we propose a weakly-supervised neural method for hierarchical Text Classification. Our method does not require a large amount of training data but requires only easy-to-provide weak supervision signals such as a few class-related documents or keywords. Our method effectively leverages such weak supervision signals to generate pseudo documents for model pre-training, and then performs self-training on real unlabeled data to iteratively refine the model. During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism. Experiments on three datasets from different domains demonstrate the efficacy of our method compared with a comprehensive set of baselines.

  • CIKM - Weakly-Supervised Neural Text Classification
    Proceedings of the 27th ACM International Conference on Information and Knowledge Management - CIKM '18, 2018
    Co-Authors: Yu Meng, Jiaming Shen, Chao Zhang
    Abstract:

    Deep neural networks are gaining increasing popularity for the classic Text Classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural Text Classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised Text Classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural Text Classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for Text Classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.

Qiang Yang - One of the best experts on this subject based on the ideXlab platform.

  • topic bridged plsa for cross domain Text Classification
    International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008
    Co-Authors: Qiang Yang, Yong Yu
    Abstract:

    In many Web applications, such as blog Classification and new-sgroup Classification, labeled data are in short supply. It often happens that obtaining labeled data in a new domain is expensive and time consuming, while there may be plenty of labeled data in a related but different domain. Traditional Text Classification ap-proaches are not able to cope well with learning across different domains. In this paper, we propose a novel cross-domain Text Classification algorithm which extends the traditional probabilistic latent semantic analysis (PLSA) algorithm to integrate labeled and unlabeled data, which come from different but related domains, into a unified probabilistic model. We call this new model Topic-bridged PLSA, or TPLSA. By exploiting the common topics between two domains, we transfer knowledge across different domains through a topic-bridge to help the Text Classification in the target domain. A unique advantage of our method is its ability to maximally mine knowledge that can be transferred between domains, resulting in superior performance when compared to other state-of-the-art Text Classification approaches. Experimental eval-uation on different kinds of datasets shows that our proposed algorithm can improve the performance of cross-domain Text Classification significantly.

Rasha Elhassan - One of the best experts on this subject based on the ideXlab platform.

  • Arabic Text Classification Process
    2017
    Co-Authors: Rasha Elhassan
    Abstract:

    Due to the richness of the language, the ordinary Arabic Text Classification of the content is very complex, difficult and challenging task. This resulted as unavailable benchmarking Arabic corpus, limitation of research and ambiguous processing phases in the field of Arabic Text Classification, This paper presents the complex nature of Arabic language, poses the problems of lacking free public Arabic corpora, explain the Classification phases throw the literature Arabic Text Classification

  • Arabic Text Classification review
    2015
    Co-Authors: Mahmoud Ahmed, Rasha Elhassan
    Abstract:

    A millions of the documents are available free and online. These documents must be first organized systematically for its proper utilization to make a decision from it. There are a lot of applications that help in organizing the documents. Text Classification is deal with how the document belongs to its suitable class or category. Arabic language is richness and a very complex inflectional language which makes ordinary analysis a very complex task. This paper focuses on the published research in the field of Arabic Text Classification and presents a scientific view about the process of it and camper the evaluation of Text Classification techniques that were used.

Yu Meng - One of the best experts on this subject based on the ideXlab platform.

  • weakly supervised hierarchical Text Classification
    National Conference on Artificial Intelligence, 2019
    Co-Authors: Yu Meng, Jiaming Shen, Chao Zhang
    Abstract:

    Hierarchical Text Classification, which aims to classify Text documents into a given hierarchy, is an important task in many real-world applications. Recently, deep neural models are gaining increasing popularity for Text Classification due to their expressive power and minimum requirement for feature engineering. However, applying deep neural networks for hierarchical Text Classification remains challenging, because they heavily rely on a large amount of training data and meanwhile cannot easily determine appropriate levels of documents in the hierarchical setting. In this paper, we propose a weakly-supervised neural method for hierarchical Text Classification. Our method does not require a large amount of training data but requires only easy-to-provide weak supervision signals such as a few class-related documents or keywords. Our method effectively leverages such weak supervision signals to generate pseudo documents for model pre-training, and then performs self-training on real unlabeled data to iteratively refine the model. During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism. Experiments on three datasets from different domains demonstrate the efficacy of our method compared with a comprehensive set of baselines.

  • CIKM - Weakly-Supervised Neural Text Classification
    Proceedings of the 27th ACM International Conference on Information and Knowledge Management - CIKM '18, 2018
    Co-Authors: Yu Meng, Jiaming Shen, Chao Zhang
    Abstract:

    Deep neural networks are gaining increasing popularity for the classic Text Classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural Text Classification models suffer from the lack of training data in many real-world applications. Although many semi-supervised and weakly-supervised Text Classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types. In this paper, we propose a weakly-supervised method that addresses the lack of training data in neural Text Classification. Our method consists of two modules: (1) a pseudo-document generator that leverages seed information to generate pseudo-labeled documents for model pre-training, and (2) a self-training module that bootstraps on real unlabeled data for model refinement. Our method has the flexibility to handle different types of weak supervision and can be easily integrated into existing deep neural models for Text Classification. We have performed extensive experiments on three real-world datasets from different domains. The results demonstrate that our proposed method achieves inspiring performance without requiring excessive training data and outperforms baseline methods significantly.