Mediated Schema

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 1977 Experts worldwide ranked by ideXlab platform

Ashraf Aboulnaga - One of the best experts on this subject based on the ideXlab platform.

  • CIKM - UFeed: Refining Web Data Integration Based on User Feedback
    Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017
    Co-Authors: Ahmed El-roby, Ashraf Aboulnaga
    Abstract:

    One of the main challenges in large-scale data integration for relational Schemas is creating an accurate Mediated Schema, and generating accurate semantic mappings between heterogeneous data sources and this Mediated Schema. Some applications can start with a moderately accurate Mediated Schema and mappings and refine them over time, which is referred to as the pay-as-you-go approach to data integration. Creating the Mediated Schema and mappings automatically to bootstrap the pay-as-you-go approach has been extensively studied. However, refining the Mediated Schema and mappings is still an open challenge because the data sources are usually heterogeneous and use diverse and sometimes ambiguous vocabularies. In this paper, we introduce UFeed, a system that refines relational Mediated Schemas and mappings based on user feedback over query answers. UFeed translates user actions into refinement operations that are applied to the Mediated Schema and mappings to improve their quality. We experimentally verify that UFeed improves the quality of query answers over real heterogeneous data sources extracted from the web.

  • SIGMOD Conference - Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems
    Proceedings of the 2010 international conference on Management of data - SIGMOD '10, 2010
    Co-Authors: Hatem A. Mahmoud, Ashraf Aboulnaga
    Abstract:

    A data integration system offers a single interface to multiple structured data sources. Many application contexts (e.g., searching structured data on the web) involve the integration of large numbers of structured data sources. At web scale, it is impractical to use manual or semi-automatic data integration methods, so a pay-as-you-go approach is more appropriate. A pay-as-you-go approach entails using a fully automatic approximate data integration technique to provide an initial data integration system (i.e., an initial Mediated Schema, and initial mappings from source Schemas to the Mediated Schema), and then refining the system as it gets used. Previous research has investigated automatic approximate data integration techniques, but all existing techniques require the Schemas being integrated to belong to the same conceptual domain. At web scale, it is impractical to classify Schemas into domains manually or semi-automatically, which limits the applicability of these techniques. In this paper, we present an approach for clustering Schemas into domains without any human intervention and based only on the names of attributes in the Schemas. Our clustering approach deals with uncertainty in assigning Schemas to domains using a probabilistic model. We also propose a query classifier that determines, for a given a keyword query, the most relevant domains to this query. We experimentally demonstrate the effectiveness of our Schema clustering and query classification techniques.

  • ICDE - μBE: User Guided Source Selection and Schema Mediation for Internet Scale Data Integration
    2007 IEEE 23rd International Conference on Data Engineering, 2007
    Co-Authors: Ashraf Aboulnaga, Kareem El Gebaly, Daniel Wong
    Abstract:

    The typical approach to data integration is to start by defining a common Mediated Schema, and then to map the data sources being integrated to this Schema. In Internet-scale data integration tasks, where there may be hundreds or thousands of data sources providing data of relevance to a particular domain, a better approach is to allow the user to discover the Mediated Schema and the set of sources to use through an iterative exploration of the space of possible Schemas and sources. In this paper, we present μBE, a data integration tool that helps in this iterative exploratory process by automatically choosing the data sources to include in a data integration system and defining a Mediated Schema on these sources. The data integration system desired by the user may depend on several subjective and objective criteria, and the user guides μBE towards finding this system by iteratively solving a series of constrained non-linear optimization problems, and modifying the parameters and constraints of the problem in the next iteration based on the solution found in the current iteration. Our formulation of the optimization problem is designed to make it easy for the user to provide such feedback. A simple, intuitive user interface helps the user in this process. We experimentally demonstrate that μBE is efficient and finds high-quality data integration solutions.

Vânia Maria Ponte Vidal - One of the best experts on this subject based on the ideXlab platform.

  • modeling the Mediated Schema constraints
    Engineering of Computer-Based Systems, 2010
    Co-Authors: Marco A Casanova, Tanara Lauschner, Luiz Andre Paes P Leme, Karin Breitman, Antonio L Furtado, Vânia Maria Ponte Vidal
    Abstract:

    In this paper, we address the problem of modeling the constraints of a Mediated Schema. We argue that, from the point of view of an application that processes the results of queries defined over the Mediated Schema, the constraints should be modeled as the greatest lower bound of the constraints of the export Schemas, after appropriate translation to a common vocabulary. This assures that the application will correctly interpret query results.

  • ECBS - Modeling the Mediated Schema Constraints
    2010 17th IEEE International Conference and Workshops on Engineering of Computer Based Systems, 2010
    Co-Authors: Marco A Casanova, Tanara Lauschner, Luiz Andre Paes P Leme, Karin Breitman, Antonio L Furtado, Vânia Maria Ponte Vidal
    Abstract:

    In this paper, we address the problem of modeling the constraints of a Mediated Schema. We argue that, from the point of view of an application that processes the results of queries defined over the Mediated Schema, the constraints should be modeled as the greatest lower bound of the constraints of the export Schemas, after appropriate translation to a common vocabulary. This assures that the application will correctly interpret query results.

  • Revising the constraints of lightweight Mediated Schemas
    Data & Knowledge Engineering, 2010
    Co-Authors: Marco A Casanova, Tanara Lauschner, Karin Breitman, Antonio L Furtado, Luiz André P. Paes Leme, Vânia Maria Ponte Vidal
    Abstract:

    In this article, we address the problem of changing the constraints of a Mediated Schema to accommodate the set of constraints of a new export Schema. The relevance of this problem lies in that the constraints of a Mediated Schema capture the common semantics of the data sources and, as such, they must be maintained and made available to the users of the mediation environment. We first argue that such problem can be solved by computing the greatest lower bound of two theories induced by sets of constraints, defined as the intersection of the theories. Then, for an expressive family of conceptual Schemas, we show how to efficiently decide logical implication and how to compute the greatest lower bound of two theories induced by sets of constraints. The family of conceptual Schemas we work with partly corresponds to OWL Lite and supports the equivalent of named classes, datatype and object properties, minCardinalities and maxCardinalities, InverseFunctionalProperties, subset constraints, and disjointness constraints. Such Schemas are also sufficiently expressive to encode commonly used UML constructs, such as classes, attributes, binary associations without association classes, cardinality of binary associations, multiplicity of attributes, and ISA hierarchies with disjointness, but not with complete generalizations.

  • ER - A Strategy to Revise the Constraints of the Mediated Schema
    Conceptual Modeling - ER 2009, 2009
    Co-Authors: Marco A Casanova, Tanara Lauschner, Luiz Andre Paes P Leme, Karin Breitman, Antonio L Furtado, Vânia Maria Ponte Vidal
    Abstract:

    In this paper, we address the problem of changing the constraints of a Mediated Schema M to accommodate the constraints of a new export Schema E 0 . We first show how to translate the constraints of E 0 to the vocabulary of M , creating a set of constraints C 0 in such a way that the Schema mapping for E 0 is correct. Then, we show how to compute the new version of the constraints of M to accommodate C 0 so that all Schema mappings, including that for E 0 , are correct. We solve both problems for subset and cardinality constraints and specific families of Schema mappings.

Peter Tarczyhornoch - One of the best experts on this subject based on the ideXlab platform.

Peter Mork - One of the best experts on this subject based on the ideXlab platform.

  • DILS - The multiple roles of ontologies in the biomediator data integration system
    Lecture Notes in Computer Science, 2005
    Co-Authors: Peter Mork, Ron Shaker, Peter Tarczy-hornoch
    Abstract:

    BioMediator is a data integration system that provides a common interface to multiple Internet-accessible databases containing information about genetics and molecular biology. Ontologies play several important roles in the BioMediator system: First, ontologies of genetics and molecular biology can serve as data sources. In this role concepts from the ontologies are returned as results of queries. Second, queries are posed against a Mediated Schema, which is an ontology describing the domain of discourse. User queries are expressed using the concepts in the Mediated Schema to indicate which results to retrieve. Third, each data source is an instance of the system ontology. This ontology describes information about the data sources including how often the source is updated and by whom. Finally, we are exploring the use of ontologies as a mechanism for mapping data sources to the Mediated Schema. This will facilitate extending BioMediator from a centralized integration platform to a distributed network of peers.

  • AMIA - A rule driven bi-directional translation system for remapping queries and result sets between a Mediated Schema and heterogeneous data sources.
    Proceedings. AMIA Symposium, 2002
    Co-Authors: Ron Shaker, Peter Mork, Matt Barclay, Peter Tarczy-hornoch
    Abstract:

    As the number of online biomedical data sources increases, so too do the number of ways to access such data. The research described herein focuses on creating a data access system that provides bi-directional translation and mapping of data between heterogeneous databases and a Mediated Schema. Semantic mapping rules stored in a knowledge base are used by our generalized software to convert XML query results obtained from each data source to a common Schema representing a single ontology. We apply this approach to the domain of online genetic databases, demonstrating the system's scalability and integratability.

  • a rule driven bi directional translation system for remapping queries and result sets between a Mediated Schema and heterogeneous data sources
    American Medical Informatics Association Annual Symposium, 2002
    Co-Authors: Ron Shaker, Peter Mork, Matt Barclay, Peter Tarczyhornoch
    Abstract:

    As the number of online biomedical data sources increases, so too do the number of ways to access such data. The research described herein focuses on creating a data access system that provides bi-directional translation and mapping of data between heterogeneous databases and a Mediated Schema. Semantic mapping rules stored in a knowledge base are used by our generalized software to convert XML query results obtained from each data source to a common Schema representing a single ontology. We apply this approach to the domain of online genetic databases, demonstrating the system's scalability and integratability.

  • a model for data integration systems of biomedical data applied to online genetic databases
    American Medical Informatics Association Annual Symposium, 2001
    Co-Authors: Peter Mork, Alon Halevy, Alon Halevy, Peter Tarczyhornoch
    Abstract:

    We present a general model for data integration systems using a Mediated Schema to represent commonalities in the underlying sources. These sources are mapped to the Mediated Schema using source descriptions. Users can pose queries against the Mediated Schema, allowing the system to generate automatically a query plan that enumerates and ranks all possible ways in which the query could be answered. We apply this approach to the domain of online genetic databases, demonstrating the system s ability to answer relevant queries across multiple sources.

  • AMIA - A model for data integration systems of biomedical data applied to online genetic databases.
    Proceedings. AMIA Symposium, 2001
    Co-Authors: Peter Mork, Alon Halevy, Peter Tarczy-hornoch
    Abstract:

    We present a general model for data integration systems using a Mediated Schema to represent commonalities in the underlying sources. These sources are mapped to the Mediated Schema using source descriptions. Users can pose queries against the Mediated Schema, allowing the system to generate automatically a query plan that enumerates and ranks all possible ways in which the query could be answered. We apply this approach to the domain of online genetic databases, demonstrating the system s ability to answer relevant queries across multiple sources.

Marco A Casanova - One of the best experts on this subject based on the ideXlab platform.

  • modeling the Mediated Schema constraints
    Engineering of Computer-Based Systems, 2010
    Co-Authors: Marco A Casanova, Tanara Lauschner, Luiz Andre Paes P Leme, Karin Breitman, Antonio L Furtado, Vânia Maria Ponte Vidal
    Abstract:

    In this paper, we address the problem of modeling the constraints of a Mediated Schema. We argue that, from the point of view of an application that processes the results of queries defined over the Mediated Schema, the constraints should be modeled as the greatest lower bound of the constraints of the export Schemas, after appropriate translation to a common vocabulary. This assures that the application will correctly interpret query results.

  • ECBS - Modeling the Mediated Schema Constraints
    2010 17th IEEE International Conference and Workshops on Engineering of Computer Based Systems, 2010
    Co-Authors: Marco A Casanova, Tanara Lauschner, Luiz Andre Paes P Leme, Karin Breitman, Antonio L Furtado, Vânia Maria Ponte Vidal
    Abstract:

    In this paper, we address the problem of modeling the constraints of a Mediated Schema. We argue that, from the point of view of an application that processes the results of queries defined over the Mediated Schema, the constraints should be modeled as the greatest lower bound of the constraints of the export Schemas, after appropriate translation to a common vocabulary. This assures that the application will correctly interpret query results.

  • Revising the constraints of lightweight Mediated Schemas
    Data & Knowledge Engineering, 2010
    Co-Authors: Marco A Casanova, Tanara Lauschner, Karin Breitman, Antonio L Furtado, Luiz André P. Paes Leme, Vânia Maria Ponte Vidal
    Abstract:

    In this article, we address the problem of changing the constraints of a Mediated Schema to accommodate the set of constraints of a new export Schema. The relevance of this problem lies in that the constraints of a Mediated Schema capture the common semantics of the data sources and, as such, they must be maintained and made available to the users of the mediation environment. We first argue that such problem can be solved by computing the greatest lower bound of two theories induced by sets of constraints, defined as the intersection of the theories. Then, for an expressive family of conceptual Schemas, we show how to efficiently decide logical implication and how to compute the greatest lower bound of two theories induced by sets of constraints. The family of conceptual Schemas we work with partly corresponds to OWL Lite and supports the equivalent of named classes, datatype and object properties, minCardinalities and maxCardinalities, InverseFunctionalProperties, subset constraints, and disjointness constraints. Such Schemas are also sufficiently expressive to encode commonly used UML constructs, such as classes, attributes, binary associations without association classes, cardinality of binary associations, multiplicity of attributes, and ISA hierarchies with disjointness, but not with complete generalizations.

  • ER - A Strategy to Revise the Constraints of the Mediated Schema
    Conceptual Modeling - ER 2009, 2009
    Co-Authors: Marco A Casanova, Tanara Lauschner, Luiz Andre Paes P Leme, Karin Breitman, Antonio L Furtado, Vânia Maria Ponte Vidal
    Abstract:

    In this paper, we address the problem of changing the constraints of a Mediated Schema M to accommodate the constraints of a new export Schema E 0 . We first show how to translate the constraints of E 0 to the vocabulary of M , creating a set of constraints C 0 in such a way that the Schema mapping for E 0 is correct. Then, we show how to compute the new version of the constraints of M to accommodate C 0 so that all Schema mappings, including that for E 0 , are correct. We solve both problems for subset and cardinality constraints and specific families of Schema mappings.

  • SEW - Database Mediation Using Multi-agent Systems
    2008 32nd Annual IEEE Software Engineering Workshop, 2008
    Co-Authors: Luiz André P. Paes Leme, Marco A Casanova, Karin Breitman, Antonio L Furtado
    Abstract:

    This paper first proposes a multi-agent architecture to mediate access to data sources. The mediator follows the classical approach to process user queries. However, in the background, it post-processes query results to gradually construct matchings between the export Schemas and the Mediated Schema. The central theme of the paper is an extensional Schema matching strategy based on similarity functions. The paper concludes with experimental results that assess the quality of the matching strategy.