Extracting Knowledge

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 35193 Experts worldwide ranked by ideXlab platform

Arijit Sengupta - One of the best experts on this subject based on the ideXlab platform.

  • Extracting Knowledge from XML document repository: a semantic Web-based approach
    Information Technology and Management, 2007
    Co-Authors: Henry M. Kim, Arijit Sengupta
    Abstract:

    XML plays an important role as the standard language for representing structured data for the traditional Web, and hence many Web-based Knowledge management repositories store data and documents in XML. If semantics about the data are formally represented in an ontology, then it is possible to extract Knowledge: This is done as ontology definitions and axioms are applied to XML data to automatically infer Knowledge that is not explicitly represented in the repository. Ontologies also play a central role in realizing the burgeoning vision of the semantic Web, wherein data will be more sharable because their semantics will be represented in Web-accessible ontologies. In this paper, we demonstrate how an ontology can be used to extract Knowledge from an exemplar XML repository of Shakespeare’s plays. We then implement an architecture for this ontology using de facto languages of the semantic Web including OWL and RuleML, thus preparing the ontology for use in data sharing. It has been predicted that the early adopters of the semantic Web will develop ontologies that leverage XML, provide intra-organizational value such as Knowledge extraction capabilities that are irrespective of the semantic Web, and have the potential for inter-organizational data sharing over the semantic Web. The contribution of our proof-of-concept application, KROX, is that it serves as a blueprint for other ontology developers who believe that the growth of the semantic Web will unfold in this manner.

Henry M. Kim - One of the best experts on this subject based on the ideXlab platform.

  • Extracting Knowledge from XML document repository: a semantic Web-based approach
    Information Technology and Management, 2007
    Co-Authors: Henry M. Kim, Arijit Sengupta
    Abstract:

    XML plays an important role as the standard language for representing structured data for the traditional Web, and hence many Web-based Knowledge management repositories store data and documents in XML. If semantics about the data are formally represented in an ontology, then it is possible to extract Knowledge: This is done as ontology definitions and axioms are applied to XML data to automatically infer Knowledge that is not explicitly represented in the repository. Ontologies also play a central role in realizing the burgeoning vision of the semantic Web, wherein data will be more sharable because their semantics will be represented in Web-accessible ontologies. In this paper, we demonstrate how an ontology can be used to extract Knowledge from an exemplar XML repository of Shakespeare’s plays. We then implement an architecture for this ontology using de facto languages of the semantic Web including OWL and RuleML, thus preparing the ontology for use in data sharing. It has been predicted that the early adopters of the semantic Web will develop ontologies that leverage XML, provide intra-organizational value such as Knowledge extraction capabilities that are irrespective of the semantic Web, and have the potential for inter-organizational data sharing over the semantic Web. The contribution of our proof-of-concept application, KROX, is that it serves as a blueprint for other ontology developers who believe that the growth of the semantic Web will unfold in this manner.

Stephen Soderland - One of the best experts on this subject based on the ideXlab platform.

  • a probabilistic model of redundancy in information extraction
    International Joint Conference on Artificial Intelligence, 2005
    Co-Authors: Doug Downey, Oren Etzioni, Stephen Soderland
    Abstract:

    Unsupervised Information Extraction (UIE) is the task of Extracting Knowledge from text without using hand-tagged training examples. A fundamental problem for both UIE and supervised IE is assessing the probability that extracted information is correct. In massive corpora such as the Web, the same extraction is found repeatedly in different documents. How does this redundancy impact the probability of correctness? This paper introduces a combinatorial "balls-andurns" model that computes the impact of sample size, redundancy, and corroboration from multiple distinct extraction rules on the probability that an extraction is correct. We describe methods for estimating the model's parameters in practice and demonstrate experimentally that for UIE the model's log likelihoods are 15 times better, on average, than those obtained by Pointwise Mutual Information (PMI) and the noisy-or model used in previous work. For supervised IE, the model's performance is comparable to that of Support Vector Machines, and Logistic Regression.

  • a probabilistic model of redundancy in information extraction
    International Joint Conference on Artificial Intelligence, 2005
    Co-Authors: Doug Downey, Oren Etzioni, Stephen Soderland
    Abstract:

    Unsupervised Information Extraction (UIE) is the task of Extracting Knowledge from text without using hand-tagged training examples. A fundamental problem for both UIE and supervised IE is assessing the probability that extracted information is correct. In massive corpora such as the Web, the same extraction is found repeatedly in different documents. How does this redundancy impact the probability of correctness? This paper introduces a combinatorial "balls-andurns" model that computes the impact of sample size, redundancy, and corroboration from multiple distinct extraction rules on the probability that an extraction is correct. We describe methods for estimating the model's parameters in practice and demonstrate experimentally that for UIE the model's log likelihoods are 15 times better, on average, than those obtained by Pointwise Mutual Information (PMI) and the noisy-or model used in previous work. For supervised IE, the model's performance is comparable to that of Support Vector Machines, and Logistic Regression.

Pieter Adriaans - One of the best experts on this subject based on the ideXlab platform.

  • structuring and Extracting Knowledge for the support of hypothesis generation in molecular biology
    BMC Bioinformatics, 2009
    Co-Authors: Marco Roos, Scott M Marshall, Andrew P Gibson, Martijn J Schuemie, Edgar Meij, Sophia Katrenko, Willem Robert Van Hage, Konstantinos Krommydas, Pieter Adriaans
    Abstract:

    Hypothesis generation in molecular and cellular biology is an empirical process in which Knowledge derived from prior experiments is distilled into a comprehensible model. The requirement of automated support is exemplified by the difficulty of considering all relevant facts that are contained in the millions of documents available from PubMed. Semantic Web provides tools for sharing prior Knowledge, while information retrieval and information extraction techniques enable its extraction from literature. Their combination makes prior Knowledge available for computational analysis and inference. While some tools provide complete solutions that limit the control over the modeling and extraction processes, we seek a methodology that supports control by the experimenter over these critical processes. We describe progress towards automated support for the generation of biomolecular hypotheses. Semantic Web technologies are used to structure and store Knowledge, while a workflow extracts Knowledge from text. We designed minimal proto-ontologies in OWL for capturing different aspects of a text mining experiment: the biological hypothesis, text and documents, text mining, and workflow provenance. The models fit a methodology that allows focus on the requirements of a single experiment while supporting reuse and posterior analysis of extracted Knowledge from multiple experiments. Our workflow is composed of services from the 'Adaptive Information Disclosure Application' (AIDA) toolkit as well as a few others. The output is a semantic model with putative biological relations, with each relation linked to the corresponding evidence. We demonstrated a 'do-it-yourself' approach for structuring and Extracting Knowledge in the context of experimental research on biomolecular mechanisms. The methodology can be used to bootstrap the construction of semantically rich biological models using the results of Knowledge extraction processes. Models specific to particular experiments can be constructed that, in turn, link with other semantic models, creating a web of Knowledge that spans experiments. Mapping mechanisms can link to other Knowledge resources such as OBO ontologies or SKOS vocabularies. AIDA Web Services can be used to design personalized Knowledge extraction procedures. In our example experiment, we found three proteins (NF-Kappa B, p21, and Bax) potentially playing a role in the interplay between nutrients and epigenetic gene regulation.

Roberto Cerchione - One of the best experts on this subject based on the ideXlab platform.

  • Extracting Knowledge from big data for sustainability a comparison of machine learning techniques
    Sustainability, 2019
    Co-Authors: Raghu Garg, Himanshu Aggarwal, Piera Centobelli, Roberto Cerchione
    Abstract:

    At present, due to the unavailability of natural resources, society should take the maximum advantage of data, information, and Knowledge to achieve sustainability goals. In today’s world condition, the existence of humans is not possible without the essential proliferation of plants. In the photosynthesis procedure, plants use solar energy to convert into chemical energy. This process is responsible for all life on earth, and the main controlling factor for proper plant growth is soil since it holds water, air, and all essential nutrients of plant nourishment. Though, due to overexposure, soil gets despoiled, so fertilizer is an essential component to hold the soil quality. In that regard, soil analysis is a suitable method to determine soil quality. Soil analysis examines the soil in laboratories and generates reports of unorganized and insignificant data. In this study, different big data analysis machine learning methods are used to Extracting Knowledge from data to find out fertilizer recommendation classes on behalf of present soil nutrition composition. For this experiment, soil analysis reports are collected from the Tata soil and water testing center. In this paper, Mahoot library is used for analysis of stochastic gradient descent (SGD), artificial neural network (ANN) performance on Hadoop environment. For better performance evaluation, we also used single machine experiments for random forest (RF), K-nearest neighbors K-NN, regression tree (RT), support vector machine (SVM) using polynomial function, SVM using radial basis function (RBF) methods. Detailed experimental analysis was carried out using overall accuracy, AUC–ROC (receiver operating characteristics (ROC), and area under the ROC curve (AUC)) curve, mean absolute prediction error (MAE), root mean square error (RMSE), and coefficient of determination (R2) validation measurements on soil reports dataset. The results provide a comparison of solution classes and conclude that the SGD outperforms other approaches. Finally, the proposed results support to select the solution or recommend a class which suggests suitable fertilizer to crops for maximum production.