Feature Generation

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 72837 Experts worldwide ranked by ideXlab platform

Shaul Markovitch - One of the best experts on this subject based on the ideXlab platform.

  • Knowledge-Based Learning through Feature Generation.
    arXiv: Learning, 2020
    Co-Authors: Michal Badian, Shaul Markovitch
    Abstract:

    Machine learning algorithms have difficulties to generalize over a small set of examples. Humans can perform such a task by exploiting vast amount of background knowledge they possess. One method for enhancing learning algorithms with external knowledge is through Feature Generation. In this paper, we introduce a new algorithm for generating Features based on a collection of auxiliary datasets. We assume that, in addition to the training set, we have access to additional datasets. Unlike the transfer learning setup, we do not assume that the auxiliary datasets represent learning tasks that are similar to our original one. The algorithm finds Features that are common to the training set and the auxiliary datasets. Based on these Features and examples from the auxiliary datasets, it induces predictors for new Features from the auxiliary datasets. The induced predictors are then added to the original training set as generated Features. Our method was tested on a variety of learning tasks, including text classification and medical prediction, and showed a significant improvement over using just the given Features.

  • Recursive Feature Generation for Knowledge-based Learning
    arXiv: Artificial Intelligence, 2018
    Co-Authors: Lior Friedman, Shaul Markovitch
    Abstract:

    When humans perform inductive learning, they often enhance the process with background knowledge. With the increasing availability of well-formed collaborative knowledge bases, the performance of learning algorithms could be significantly enhanced if a way were found to exploit these knowledge bases. In this work, we present a novel algorithm for injecting external knowledge into induction algorithms using Feature Generation. Given a Feature, the algorithm defines a new learning task over its set of values, and uses the knowledge base to solve the constructed learning task. The resulting classifier is then used as a new Feature for the original problem. We have applied our algorithm to the domain of text classification using large semantic knowledge bases. We have shown that the generated Features significantly improve the performance of existing learning algorithms.

  • concept based Feature Generation and selection for information retrieval
    National Conference on Artificial Intelligence, 2008
    Co-Authors: Ofer Egozi, Evgeniy Gabrilovich, Shaul Markovitch
    Abstract:

    Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowledge. We present a method that uses Wikipedia-based Feature Generation to improve retrieval performance. Intuitively, we expect that using extensive world knowledge is likely to improve recall but may adversely affect precision. High quality Feature selection is necessary to maintain high precision, but here we do not have the labeled training data for evaluating Features, that we have in supervised learning. We present a new Feature selection method that is inspired by pseudorelevance feedback. We use the top-ranked and bottom-ranked documents retrieved by the bag-of-words method as representative sets of relevant and non-relevant documents. The generated Features are then evaluated and filtered on the basis of these sets. Experiments on TREC data confirm the superior performance of our method compared to the previous state of the art.

  • AAAI - Concept-based Feature Generation and selection for information retrieval
    2008
    Co-Authors: Ofer Egozi, Evgeniy Gabrilovich, Shaul Markovitch
    Abstract:

    Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowledge. We present a method that uses Wikipedia-based Feature Generation to improve retrieval performance. Intuitively, we expect that using extensive world knowledge is likely to improve recall but may adversely affect precision. High quality Feature selection is necessary to maintain high precision, but here we do not have the labeled training data for evaluating Features, that we have in supervised learning. We present a new Feature selection method that is inspired by pseudorelevance feedback. We use the top-ranked and bottom-ranked documents retrieved by the bag-of-words method as representative sets of relevant and non-relevant documents. The generated Features are then evaluated and filtered on the basis of these sets. Experiments on TREC data confirm the superior performance of our method compared to the previous state of the art.

  • Feature Generation for text categorization using world knowledge
    International Joint Conference on Artificial Intelligence, 2005
    Co-Authors: Evgeniy Gabrilovich, Shaul Markovitch
    Abstract:

    We enhance machine learning algorithms for text categorization with generated Features based on domain-specific and common-sense knowledge. This knowledge is represented using publicly available ontologies that contain hundreds of thousands of concepts, such as the Open Directory; these ontologies are further enriched by several orders of magnitude through controlled Web crawling. Prior to text categorization, a Feature generator analyzes the documents and maps them onto appropriate ontology concepts, which in turn induce a set of generated Features that augment the standard bag of words. Feature Generation is accomplished through contextual analysis of document text, implicitly performing word sense disambiguation. Coupled with the ability to generalize concepts using the ontology, this approach addresses the two main problems of natural language processing--synonymy and polysemy. Categorizing documents with the aid of knowledge-based Features leverages information that cannot be deduced from the documents alone. Experimental results confirm improved performance, breaking through the plateau previously reached in the field.

Hirobumi Nishida - One of the best experts on this subject based on the ideXlab platform.

  • IWVF - Robust Structural Indexing through Quasi-Invariant Shape Signatures and Feature Generation
    Lecture Notes in Computer Science, 2001
    Co-Authors: Hirobumi Nishida
    Abstract:

    A robust method is presented for retrieval of model shapes that have parts similar to the query shape presented to the image database. Structural Feature indexing is a potential approach to efficient shape retrieval from large databases, but it is sensitive to noise, scales of observation, and local shape deformations. To improve the robustness, shape Feature Generation techniques are incorporated into structural indexing based on quasi-invariant shape signatures. The Feature transformation rules obtained by an analysis of some particular types of shape deformations are exploited to generate Features that can be extracted from deformed patterns. Effectiveness is confirmed through experimental trials with databases of boundary contours, and is validated by systematically designed experiments with a large number of synthetic data.

  • Structural Shape Indexing with Feature Generation Models
    Computer Vision and Image Understanding, 1999
    Co-Authors: Hirobumi Nishida
    Abstract:

    Structural indexing is a potential approach to efficient classification and retrieval of image patterns with respect to a very large number of models. This technique is based on the idea of distributing Features associated with model identifiers over a large data structure prepared for a model set, along with classification by voting for models with reference to the extracted Features. Essential problems caused by mapping image Features to discrete indices are that indexing is sensitive to noise, scales of observation, and local shape deformations, and thata prioriknowledge and Feature distributions of corrupted instances are not available for each class when a large number of training data are not presented. To cope with these problems, shape Feature Generation techniques are incorporated into structural indexing. An analysis of Feature transformations is carried out for some particular types of shape deformations, leading to Feature Generation rules composed of a small number of distinct cases. The rules are exploited to generate Features that can be extracted from deformed patterns caused by noise and local shape deformations. In both processes of model database organization and classification, the generated Features by the transformation rules are used for structural indexing and voting, as well as the Features actually extracted from contours. The effectiveness of the proposed method is demonstrated by experimental trials with a large number of sample data. Furthermore, its application to shape retrieval from image databases is mentioned. The shape Feature Generation significantly improves the classification accuracy and efficiency.

  • SSPR/SPR - Structural Indexing of Line Pictures with Feature Generation Models
    Advances in Pattern Recognition, 1998
    Co-Authors: Hirobumi Nishida
    Abstract:

    Structural indexing is a potential approach to efficient classification and retrieval of image patterns with respect to a very large number of models. Essential problems caused by mapping image Features to discrete indices are that the indexing is sensitive to noise, scales of observation, and local shape deformations, and that a priori knowledge or Feature distributions of corrupted instances are not available for each class when a large number of training data are not presented. To cope with these problems, shape Feature Generation techniques are incorporated into structural indexing. The Feature transformation rules obtained by an analysis of some particular types of shape deformations are exploited to generate Features that can be extracted from deformed patterns. The generated Features are used in model database organization and classification. Experimental trials with a large number of sample data show that the shape Feature Generation significantly improves the classification accuracy and efficiency.

Tong Zhang - One of the best experts on this subject based on the ideXlab platform.

  • two view Feature Generation model for semi supervised learning
    International Conference on Machine Learning, 2007
    Co-Authors: Rie Kubota Ando, Tong Zhang
    Abstract:

    We consider a setting for discriminative semi-supervised learning where unlabeled data are used with a generative model to learn effective Feature representations for discriminative training. Within this framework, we revisit the two-view Feature Generation model of co-training and prove that the optimum predictor can be expressed as a linear combination of a few Features constructed from unlabeled data. From this analysis, we derive methods that employ two views but are very different from co-training. Experiments show that our approach is more robust than co-training and EM, under various data Generation conditions.

  • ICML - Two-view Feature Generation model for semi-supervised learning
    Proceedings of the 24th international conference on Machine learning - ICML '07, 2007
    Co-Authors: Rie Kubota Ando, Tong Zhang
    Abstract:

    We consider a setting for discriminative semi-supervised learning where unlabeled data are used with a generative model to learn effective Feature representations for discriminative training. Within this framework, we revisit the two-view Feature Generation model of co-training and prove that the optimum predictor can be expressed as a linear combination of a few Features constructed from unlabeled data. From this analysis, we derive methods that employ two views but are very different from co-training. Experiments show that our approach is more robust than co-training and EM, under various data Generation conditions.

Evgeniy Gabrilovich - One of the best experts on this subject based on the ideXlab platform.

  • concept based Feature Generation and selection for information retrieval
    National Conference on Artificial Intelligence, 2008
    Co-Authors: Ofer Egozi, Evgeniy Gabrilovich, Shaul Markovitch
    Abstract:

    Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowledge. We present a method that uses Wikipedia-based Feature Generation to improve retrieval performance. Intuitively, we expect that using extensive world knowledge is likely to improve recall but may adversely affect precision. High quality Feature selection is necessary to maintain high precision, but here we do not have the labeled training data for evaluating Features, that we have in supervised learning. We present a new Feature selection method that is inspired by pseudorelevance feedback. We use the top-ranked and bottom-ranked documents retrieved by the bag-of-words method as representative sets of relevant and non-relevant documents. The generated Features are then evaluated and filtered on the basis of these sets. Experiments on TREC data confirm the superior performance of our method compared to the previous state of the art.

  • AAAI - Concept-based Feature Generation and selection for information retrieval
    2008
    Co-Authors: Ofer Egozi, Evgeniy Gabrilovich, Shaul Markovitch
    Abstract:

    Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowledge. We present a method that uses Wikipedia-based Feature Generation to improve retrieval performance. Intuitively, we expect that using extensive world knowledge is likely to improve recall but may adversely affect precision. High quality Feature selection is necessary to maintain high precision, but here we do not have the labeled training data for evaluating Features, that we have in supervised learning. We present a new Feature selection method that is inspired by pseudorelevance feedback. We use the top-ranked and bottom-ranked documents retrieved by the bag-of-words method as representative sets of relevant and non-relevant documents. The generated Features are then evaluated and filtered on the basis of these sets. Experiments on TREC data confirm the superior performance of our method compared to the previous state of the art.

  • Feature Generation for text categorization using world knowledge
    International Joint Conference on Artificial Intelligence, 2005
    Co-Authors: Evgeniy Gabrilovich, Shaul Markovitch
    Abstract:

    We enhance machine learning algorithms for text categorization with generated Features based on domain-specific and common-sense knowledge. This knowledge is represented using publicly available ontologies that contain hundreds of thousands of concepts, such as the Open Directory; these ontologies are further enriched by several orders of magnitude through controlled Web crawling. Prior to text categorization, a Feature generator analyzes the documents and maps them onto appropriate ontology concepts, which in turn induce a set of generated Features that augment the standard bag of words. Feature Generation is accomplished through contextual analysis of document text, implicitly performing word sense disambiguation. Coupled with the ability to generalize concepts using the ontology, this approach addresses the two main problems of natural language processing--synonymy and polysemy. Categorizing documents with the aid of knowledge-based Features leverages information that cannot be deduced from the documents alone. Experimental results confirm improved performance, breaking through the plateau previously reached in the field.

  • IJCAI - Feature Generation for text categorization using world knowledge
    2005
    Co-Authors: Evgeniy Gabrilovich, Shaul Markovitch
    Abstract:

    We enhance machine learning algorithms for text categorization with generated Features based on domain-specific and common-sense knowledge. This knowledge is represented using publicly available ontologies that contain hundreds of thousands of concepts, such as the Open Directory; these ontologies are further enriched by several orders of magnitude through controlled Web crawling. Prior to text categorization, a Feature generator analyzes the documents and maps them onto appropriate ontology concepts, which in turn induce a set of generated Features that augment the standard bag of words. Feature Generation is accomplished through contextual analysis of document text, implicitly performing word sense disambiguation. Coupled with the ability to generalize concepts using the ontology, this approach addresses the two main problems of natural language processing--synonymy and polysemy. Categorizing documents with the aid of knowledge-based Features leverages information that cannot be deduced from the documents alone. Experimental results confirm improved performance, breaking through the plateau previously reached in the field.

Rie Kubota Ando - One of the best experts on this subject based on the ideXlab platform.

  • two view Feature Generation model for semi supervised learning
    International Conference on Machine Learning, 2007
    Co-Authors: Rie Kubota Ando, Tong Zhang
    Abstract:

    We consider a setting for discriminative semi-supervised learning where unlabeled data are used with a generative model to learn effective Feature representations for discriminative training. Within this framework, we revisit the two-view Feature Generation model of co-training and prove that the optimum predictor can be expressed as a linear combination of a few Features constructed from unlabeled data. From this analysis, we derive methods that employ two views but are very different from co-training. Experiments show that our approach is more robust than co-training and EM, under various data Generation conditions.

  • ICML - Two-view Feature Generation model for semi-supervised learning
    Proceedings of the 24th international conference on Machine learning - ICML '07, 2007
    Co-Authors: Rie Kubota Ando, Tong Zhang
    Abstract:

    We consider a setting for discriminative semi-supervised learning where unlabeled data are used with a generative model to learn effective Feature representations for discriminative training. Within this framework, we revisit the two-view Feature Generation model of co-training and prove that the optimum predictor can be expressed as a linear combination of a few Features constructed from unlabeled data. From this analysis, we derive methods that employ two views but are very different from co-training. Experiments show that our approach is more robust than co-training and EM, under various data Generation conditions.