Feature Generation

The Experts below are selected from a list of 72837 Experts worldwide ranked by ideXlab platform

Shaul Markovitch - One of the best experts on this subject based on the ideXlab platform.

Knowledge-Based Learning through Feature Generation.

arXiv: Learning, 2020

Co-Authors: Michal Badian, Shaul Markovitch

Abstract:

Machine learning algorithms have difficulties to generalize over a small set of examples. Humans can perform such a task by exploiting vast amount of background knowledge they possess. One method for enhancing learning algorithms with external knowledge is through Feature Generation. In this paper, we introduce a new algorithm for generating Features based on a collection of auxiliary datasets. We assume that, in addition to the training set, we have access to additional datasets. Unlike the transfer learning setup, we do not assume that the auxiliary datasets represent learning tasks that are similar to our original one. The algorithm finds Features that are common to the training set and the auxiliary datasets. Based on these Features and examples from the auxiliary datasets, it induces predictors for new Features from the auxiliary datasets. The induced predictors are then added to the original training set as generated Features. Our method was tested on a variety of learning tasks, including text classification and medical prediction, and showed a significant improvement over using just the given Features.

15 days free trial to Access Article
Recursive Feature Generation for Knowledge-based Learning

arXiv: Artificial Intelligence, 2018

Co-Authors: Lior Friedman, Shaul Markovitch

Abstract:

When humans perform inductive learning, they often enhance the process with background knowledge. With the increasing availability of well-formed collaborative knowledge bases, the performance of learning algorithms could be significantly enhanced if a way were found to exploit these knowledge bases. In this work, we present a novel algorithm for injecting external knowledge into induction algorithms using Feature Generation. Given a Feature, the algorithm defines a new learning task over its set of values, and uses the knowledge base to solve the constructed learning task. The resulting classifier is then used as a new Feature for the original problem. We have applied our algorithm to the domain of text classification using large semantic knowledge bases. We have shown that the generated Features significantly improve the performance of existing learning algorithms.

15 days free trial to Access Article
concept based Feature Generation and selection for information retrieval

National Conference on Artificial Intelligence, 2008

Co-Authors: Ofer Egozi, Evgeniy Gabrilovich, Shaul Markovitch

Abstract:

Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowledge. We present a method that uses Wikipedia-based Feature Generation to improve retrieval performance. Intuitively, we expect that using extensive world knowledge is likely to improve recall but may adversely affect precision. High quality Feature selection is necessary to maintain high precision, but here we do not have the labeled training data for evaluating Features, that we have in supervised learning. We present a new Feature selection method that is inspired by pseudorelevance feedback. We use the top-ranked and bottom-ranked documents retrieved by the bag-of-words method as representative sets of relevant and non-relevant documents. The generated Features are then evaluated and filtered on the basis of these sets. Experiments on TREC data confirm the superior performance of our method compared to the previous state of the art.

15 days free trial to Access Article
AAAI - Concept-based Feature Generation and selection for information retrieval

2008

Co-Authors: Ofer Egozi, Evgeniy Gabrilovich, Shaul Markovitch

Abstract:

Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowledge. We present a method that uses Wikipedia-based Feature Generation to improve retrieval performance. Intuitively, we expect that using extensive world knowledge is likely to improve recall but may adversely affect precision. High quality Feature selection is necessary to maintain high precision, but here we do not have the labeled training data for evaluating Features, that we have in supervised learning. We present a new Feature selection method that is inspired by pseudorelevance feedback. We use the top-ranked and bottom-ranked documents retrieved by the bag-of-words method as representative sets of relevant and non-relevant documents. The generated Features are then evaluated and filtered on the basis of these sets. Experiments on TREC data confirm the superior performance of our method compared to the previous state of the art.

15 days free trial to Access Article
Feature Generation for text categorization using world knowledge

International Joint Conference on Artificial Intelligence, 2005

Co-Authors: Evgeniy Gabrilovich, Shaul Markovitch

Abstract:

We enhance machine learning algorithms for text categorization with generated Features based on domain-specific and common-sense knowledge. This knowledge is represented using publicly available ontologies that contain hundreds of thousands of concepts, such as the Open Directory; these ontologies are further enriched by several orders of magnitude through controlled Web crawling. Prior to text categorization, a Feature generator analyzes the documents and maps them onto appropriate ontology concepts, which in turn induce a set of generated Features that augment the standard bag of words. Feature Generation is accomplished through contextual analysis of document text, implicitly performing word sense disambiguation. Coupled with the ability to generalize concepts using the ontology, this approach addresses the two main problems of natural language processing--synonymy and polysemy. Categorizing documents with the aid of knowledge-based Features leverages information that cannot be deduced from the documents alone. Experimental results confirm improved performance, breaking through the plateau previously reached in the field.

15 days free trial to Access Article

Hirobumi Nishida - One of the best experts on this subject based on the ideXlab platform.

IWVF - Robust Structural Indexing through Quasi-Invariant Shape Signatures and Feature Generation

Lecture Notes in Computer Science, 2001

Co-Authors: Hirobumi Nishida

Abstract:

A robust method is presented for retrieval of model shapes that have parts similar to the query shape presented to the image database. Structural Feature indexing is a potential approach to efficient shape retrieval from large databases, but it is sensitive to noise, scales of observation, and local shape deformations. To improve the robustness, shape Feature Generation techniques are incorporated into structural indexing based on quasi-invariant shape signatures. The Feature transformation rules obtained by an analysis of some particular types of shape deformations are exploited to generate Features that can be extracted from deformed patterns. Effectiveness is confirmed through experimental trials with databases of boundary contours, and is validated by systematically designed experiments with a large number of synthetic data.

15 days free trial to Access Article
Structural Shape Indexing with Feature Generation Models

Computer Vision and Image Understanding, 1999

Co-Authors: Hirobumi Nishida

Abstract:

Structural indexing is a potential approach to efficient classification and retrieval of image patterns with respect to a very large number of models. This technique is based on the idea of distributing Features associated with model identifiers over a large data structure prepared for a model set, along with classification by voting for models with reference to the extracted Features. Essential problems caused by mapping image Features to discrete indices are that indexing is sensitive to noise, scales of observation, and local shape deformations, and thata prioriknowledge and Feature distributions of corrupted instances are not available for each class when a large number of training data are not presented. To cope with these problems, shape Feature Generation techniques are incorporated into structural indexing. An analysis of Feature transformations is carried out for some particular types of shape deformations, leading to Feature Generation rules composed of a small number of distinct cases. The rules are exploited to generate Features that can be extracted from deformed patterns caused by noise and local shape deformations. In both processes of model database organization and classification, the generated Features by the transformation rules are used for structural indexing and voting, as well as the Features actually extracted from contours. The effectiveness of the proposed method is demonstrated by experimental trials with a large number of sample data. Furthermore, its application to shape retrieval from image databases is mentioned. The shape Feature Generation significantly improves the classification accuracy and efficiency.

15 days free trial to Access Article
SSPR/SPR - Structural Indexing of Line Pictures with Feature Generation Models

Advances in Pattern Recognition, 1998

Co-Authors: Hirobumi Nishida

Abstract:

Structural indexing is a potential approach to efficient classification and retrieval of image patterns with respect to a very large number of models. Essential problems caused by mapping image Features to discrete indices are that the indexing is sensitive to noise, scales of observation, and local shape deformations, and that a priori knowledge or Feature distributions of corrupted instances are not available for each class when a large number of training data are not presented. To cope with these problems, shape Feature Generation techniques are incorporated into structural indexing. The Feature transformation rules obtained by an analysis of some particular types of shape deformations are exploited to generate Features that can be extracted from deformed patterns. The generated Features are used in model database organization and classification. Experimental trials with a large number of sample data show that the shape Feature Generation significantly improves the classification accuracy and efficiency.

15 days free trial to Access Article

Tong Zhang - One of the best experts on this subject based on the ideXlab platform.

two view Feature Generation model for semi supervised learning

International Conference on Machine Learning, 2007

Co-Authors: Rie Kubota Ando, Tong Zhang

Abstract:

We consider a setting for discriminative semi-supervised learning where unlabeled data are used with a generative model to learn effective Feature representations for discriminative training. Within this framework, we revisit the two-view Feature Generation model of co-training and prove that the optimum predictor can be expressed as a linear combination of a few Features constructed from unlabeled data. From this analysis, we derive methods that employ two views but are very different from co-training. Experiments show that our approach is more robust than co-training and EM, under various data Generation conditions.

15 days free trial to Access Article
ICML - Two-view Feature Generation model for semi-supervised learning

Proceedings of the 24th international conference on Machine learning - ICML '07, 2007

Co-Authors: Rie Kubota Ando, Tong Zhang

Abstract:

We consider a setting for discriminative semi-supervised learning where unlabeled data are used with a generative model to learn effective Feature representations for discriminative training. Within this framework, we revisit the two-view Feature Generation model of co-training and prove that the optimum predictor can be expressed as a linear combination of a few Features constructed from unlabeled data. From this analysis, we derive methods that employ two views but are very different from co-training. Experiments show that our approach is more robust than co-training and EM, under various data Generation conditions.

15 days free trial to Access Article

Evgeniy Gabrilovich - One of the best experts on this subject based on the ideXlab platform.

concept based Feature Generation and selection for information retrieval

National Conference on Artificial Intelligence, 2008

Co-Authors: Ofer Egozi, Evgeniy Gabrilovich, Shaul Markovitch

Abstract:

Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowledge. We present a method that uses Wikipedia-based Feature Generation to improve retrieval performance. Intuitively, we expect that using extensive world knowledge is likely to improve recall but may adversely affect precision. High quality Feature selection is necessary to maintain high precision, but here we do not have the labeled training data for evaluating Features, that we have in supervised learning. We present a new Feature selection method that is inspired by pseudorelevance feedback. We use the top-ranked and bottom-ranked documents retrieved by the bag-of-words method as representative sets of relevant and non-relevant documents. The generated Features are then evaluated and filtered on the basis of these sets. Experiments on TREC data confirm the superior performance of our method compared to the previous state of the art.

15 days free trial to Access Article
AAAI - Concept-based Feature Generation and selection for information retrieval

2008

Co-Authors: Ofer Egozi, Evgeniy Gabrilovich, Shaul Markovitch

Abstract:

Traditional information retrieval systems use query words to identify relevant documents. In difficult retrieval tasks, however, one needs access to a wealth of background knowledge. We present a method that uses Wikipedia-based Feature Generation to improve retrieval performance. Intuitively, we expect that using extensive world knowledge is likely to improve recall but may adversely affect precision. High quality Feature selection is necessary to maintain high precision, but here we do not have the labeled training data for evaluating Features, that we have in supervised learning. We present a new Feature selection method that is inspired by pseudorelevance feedback. We use the top-ranked and bottom-ranked documents retrieved by the bag-of-words method as representative sets of relevant and non-relevant documents. The generated Features are then evaluated and filtered on the basis of these sets. Experiments on TREC data confirm the superior performance of our method compared to the previous state of the art.

15 days free trial to Access Article
Feature Generation for text categorization using world knowledge

International Joint Conference on Artificial Intelligence, 2005

Co-Authors: Evgeniy Gabrilovich, Shaul Markovitch

Abstract:

We enhance machine learning algorithms for text categorization with generated Features based on domain-specific and common-sense knowledge. This knowledge is represented using publicly available ontologies that contain hundreds of thousands of concepts, such as the Open Directory; these ontologies are further enriched by several orders of magnitude through controlled Web crawling. Prior to text categorization, a Feature generator analyzes the documents and maps them onto appropriate ontology concepts, which in turn induce a set of generated Features that augment the standard bag of words. Feature Generation is accomplished through contextual analysis of document text, implicitly performing word sense disambiguation. Coupled with the ability to generalize concepts using the ontology, this approach addresses the two main problems of natural language processing--synonymy and polysemy. Categorizing documents with the aid of knowledge-based Features leverages information that cannot be deduced from the documents alone. Experimental results confirm improved performance, breaking through the plateau previously reached in the field.

15 days free trial to Access Article
IJCAI - Feature Generation for text categorization using world knowledge

2005

Co-Authors: Evgeniy Gabrilovich, Shaul Markovitch

Abstract:

We enhance machine learning algorithms for text categorization with generated Features based on domain-specific and common-sense knowledge. This knowledge is represented using publicly available ontologies that contain hundreds of thousands of concepts, such as the Open Directory; these ontologies are further enriched by several orders of magnitude through controlled Web crawling. Prior to text categorization, a Feature generator analyzes the documents and maps them onto appropriate ontology concepts, which in turn induce a set of generated Features that augment the standard bag of words. Feature Generation is accomplished through contextual analysis of document text, implicitly performing word sense disambiguation. Coupled with the ability to generalize concepts using the ontology, this approach addresses the two main problems of natural language processing--synonymy and polysemy. Categorizing documents with the aid of knowledge-based Features leverages information that cannot be deduced from the documents alone. Experimental results confirm improved performance, breaking through the plateau previously reached in the field.

15 days free trial to Access Article

Rie Kubota Ando - One of the best experts on this subject based on the ideXlab platform.

two view Feature Generation model for semi supervised learning

International Conference on Machine Learning, 2007

Co-Authors: Rie Kubota Ando, Tong Zhang

Abstract:

We consider a setting for discriminative semi-supervised learning where unlabeled data are used with a generative model to learn effective Feature representations for discriminative training. Within this framework, we revisit the two-view Feature Generation model of co-training and prove that the optimum predictor can be expressed as a linear combination of a few Features constructed from unlabeled data. From this analysis, we derive methods that employ two views but are very different from co-training. Experiments show that our approach is more robust than co-training and EM, under various data Generation conditions.

15 days free trial to Access Article
ICML - Two-view Feature Generation model for semi-supervised learning

Proceedings of the 24th international conference on Machine learning - ICML '07, 2007

Co-Authors: Rie Kubota Ando, Tong Zhang

Abstract:

We consider a setting for discriminative semi-supervised learning where unlabeled data are used with a generative model to learn effective Feature representations for discriminative training. Within this framework, we revisit the two-view Feature Generation model of co-training and prove that the optimum predictor can be expressed as a linear combination of a few Features constructed from unlabeled data. From this analysis, we derive methods that employ two views but are very different from co-training. Experiments show that our approach is more robust than co-training and EM, under various data Generation conditions.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Shaul Markovitch - One of the best experts on this subject based on the ideXlab platform.

Knowledge-Based Learning through Feature Generation.

Recursive Feature Generation for Knowledge-based Learning

concept based Feature Generation and selection for information retrieval

AAAI - Concept-based Feature Generation and selection for information retrieval

Feature Generation for text categorization using world knowledge

Hirobumi Nishida - One of the best experts on this subject based on the ideXlab platform.

IWVF - Robust Structural Indexing through Quasi-Invariant Shape Signatures and Feature Generation

Structural Shape Indexing with Feature Generation Models

SSPR/SPR - Structural Indexing of Line Pictures with Feature Generation Models

Tong Zhang - One of the best experts on this subject based on the ideXlab platform.

two view Feature Generation model for semi supervised learning

ICML - Two-view Feature Generation model for semi-supervised learning

Evgeniy Gabrilovich - One of the best experts on this subject based on the ideXlab platform.

concept based Feature Generation and selection for information retrieval

AAAI - Concept-based Feature Generation and selection for information retrieval

Feature Generation for text categorization using world knowledge

IJCAI - Feature Generation for text categorization using world knowledge

Rie Kubota Ando - One of the best experts on this subject based on the ideXlab platform.

two view Feature Generation model for semi supervised learning

ICML - Two-view Feature Generation model for semi-supervised learning

Feature Generation

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Related terms

Shaul Markovitch - One of the best experts on this subject based on the ideXlab platform.

Hirobumi Nishida - One of the best experts on this subject based on the ideXlab platform.

Tong Zhang - One of the best experts on this subject based on the ideXlab platform.

Evgeniy Gabrilovich - One of the best experts on this subject based on the ideXlab platform.

Rie Kubota Ando - One of the best experts on this subject based on the ideXlab platform.

Related terms