The Experts below are selected from a list of 1977 Experts worldwide ranked by ideXlab platform
Ashraf Aboulnaga - One of the best experts on this subject based on the ideXlab platform.
-
CIKM - UFeed: Refining Web Data Integration Based on User Feedback
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017Co-Authors: Ahmed El-roby, Ashraf AboulnagaAbstract:One of the main challenges in large-scale data integration for relational Schemas is creating an accurate Mediated Schema, and generating accurate semantic mappings between heterogeneous data sources and this Mediated Schema. Some applications can start with a moderately accurate Mediated Schema and mappings and refine them over time, which is referred to as the pay-as-you-go approach to data integration. Creating the Mediated Schema and mappings automatically to bootstrap the pay-as-you-go approach has been extensively studied. However, refining the Mediated Schema and mappings is still an open challenge because the data sources are usually heterogeneous and use diverse and sometimes ambiguous vocabularies. In this paper, we introduce UFeed, a system that refines relational Mediated Schemas and mappings based on user feedback over query answers. UFeed translates user actions into refinement operations that are applied to the Mediated Schema and mappings to improve their quality. We experimentally verify that UFeed improves the quality of query answers over real heterogeneous data sources extracted from the web.
-
SIGMOD Conference - Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems
Proceedings of the 2010 international conference on Management of data - SIGMOD '10, 2010Co-Authors: Hatem A. Mahmoud, Ashraf AboulnagaAbstract:A data integration system offers a single interface to multiple structured data sources. Many application contexts (e.g., searching structured data on the web) involve the integration of large numbers of structured data sources. At web scale, it is impractical to use manual or semi-automatic data integration methods, so a pay-as-you-go approach is more appropriate. A pay-as-you-go approach entails using a fully automatic approximate data integration technique to provide an initial data integration system (i.e., an initial Mediated Schema, and initial mappings from source Schemas to the Mediated Schema), and then refining the system as it gets used. Previous research has investigated automatic approximate data integration techniques, but all existing techniques require the Schemas being integrated to belong to the same conceptual domain. At web scale, it is impractical to classify Schemas into domains manually or semi-automatically, which limits the applicability of these techniques. In this paper, we present an approach for clustering Schemas into domains without any human intervention and based only on the names of attributes in the Schemas. Our clustering approach deals with uncertainty in assigning Schemas to domains using a probabilistic model. We also propose a query classifier that determines, for a given a keyword query, the most relevant domains to this query. We experimentally demonstrate the effectiveness of our Schema clustering and query classification techniques.
-
ICDE - μBE: User Guided Source Selection and Schema Mediation for Internet Scale Data Integration
2007 IEEE 23rd International Conference on Data Engineering, 2007Co-Authors: Ashraf Aboulnaga, Kareem El Gebaly, Daniel WongAbstract:The typical approach to data integration is to start by defining a common Mediated Schema, and then to map the data sources being integrated to this Schema. In Internet-scale data integration tasks, where there may be hundreds or thousands of data sources providing data of relevance to a particular domain, a better approach is to allow the user to discover the Mediated Schema and the set of sources to use through an iterative exploration of the space of possible Schemas and sources. In this paper, we present μBE, a data integration tool that helps in this iterative exploratory process by automatically choosing the data sources to include in a data integration system and defining a Mediated Schema on these sources. The data integration system desired by the user may depend on several subjective and objective criteria, and the user guides μBE towards finding this system by iteratively solving a series of constrained non-linear optimization problems, and modifying the parameters and constraints of the problem in the next iteration based on the solution found in the current iteration. Our formulation of the optimization problem is designed to make it easy for the user to provide such feedback. A simple, intuitive user interface helps the user in this process. We experimentally demonstrate that μBE is efficient and finds high-quality data integration solutions.
Vânia Maria Ponte Vidal - One of the best experts on this subject based on the ideXlab platform.
-
modeling the Mediated Schema constraints
Engineering of Computer-Based Systems, 2010Co-Authors: Marco A Casanova, Tanara Lauschner, Luiz Andre Paes P Leme, Karin Breitman, Antonio L Furtado, Vânia Maria Ponte VidalAbstract:In this paper, we address the problem of modeling the constraints of a Mediated Schema. We argue that, from the point of view of an application that processes the results of queries defined over the Mediated Schema, the constraints should be modeled as the greatest lower bound of the constraints of the export Schemas, after appropriate translation to a common vocabulary. This assures that the application will correctly interpret query results.
-
ECBS - Modeling the Mediated Schema Constraints
2010 17th IEEE International Conference and Workshops on Engineering of Computer Based Systems, 2010Co-Authors: Marco A Casanova, Tanara Lauschner, Luiz Andre Paes P Leme, Karin Breitman, Antonio L Furtado, Vânia Maria Ponte VidalAbstract:In this paper, we address the problem of modeling the constraints of a Mediated Schema. We argue that, from the point of view of an application that processes the results of queries defined over the Mediated Schema, the constraints should be modeled as the greatest lower bound of the constraints of the export Schemas, after appropriate translation to a common vocabulary. This assures that the application will correctly interpret query results.
-
Revising the constraints of lightweight Mediated Schemas
Data & Knowledge Engineering, 2010Co-Authors: Marco A Casanova, Tanara Lauschner, Karin Breitman, Antonio L Furtado, Luiz André P. Paes Leme, Vânia Maria Ponte VidalAbstract:In this article, we address the problem of changing the constraints of a Mediated Schema to accommodate the set of constraints of a new export Schema. The relevance of this problem lies in that the constraints of a Mediated Schema capture the common semantics of the data sources and, as such, they must be maintained and made available to the users of the mediation environment. We first argue that such problem can be solved by computing the greatest lower bound of two theories induced by sets of constraints, defined as the intersection of the theories. Then, for an expressive family of conceptual Schemas, we show how to efficiently decide logical implication and how to compute the greatest lower bound of two theories induced by sets of constraints. The family of conceptual Schemas we work with partly corresponds to OWL Lite and supports the equivalent of named classes, datatype and object properties, minCardinalities and maxCardinalities, InverseFunctionalProperties, subset constraints, and disjointness constraints. Such Schemas are also sufficiently expressive to encode commonly used UML constructs, such as classes, attributes, binary associations without association classes, cardinality of binary associations, multiplicity of attributes, and ISA hierarchies with disjointness, but not with complete generalizations.
-
ER - A Strategy to Revise the Constraints of the Mediated Schema
Conceptual Modeling - ER 2009, 2009Co-Authors: Marco A Casanova, Tanara Lauschner, Luiz Andre Paes P Leme, Karin Breitman, Antonio L Furtado, Vânia Maria Ponte VidalAbstract:In this paper, we address the problem of changing the constraints of a Mediated Schema M to accommodate the constraints of a new export Schema E 0 . We first show how to translate the constraints of E 0 to the vocabulary of M , creating a set of constraints C 0 in such a way that the Schema mapping for E 0 is correct. Then, we show how to compute the new version of the constraints of M to accommodate C 0 so that all Schema mappings, including that for E 0 , are correct. We solve both problems for subset and cardinality constraints and specific families of Schema mappings.
Peter Tarczyhornoch - One of the best experts on this subject based on the ideXlab platform.
-
a rule driven bi directional translation system for remapping queries and result sets between a Mediated Schema and heterogeneous data sources
American Medical Informatics Association Annual Symposium, 2002Co-Authors: Ron Shaker, Peter Mork, Matt Barclay, Peter TarczyhornochAbstract:As the number of online biomedical data sources increases, so too do the number of ways to access such data. The research described herein focuses on creating a data access system that provides bi-directional translation and mapping of data between heterogeneous databases and a Mediated Schema. Semantic mapping rules stored in a knowledge base are used by our generalized software to convert XML query results obtained from each data source to a common Schema representing a single ontology. We apply this approach to the domain of online genetic databases, demonstrating the system's scalability and integratability.
-
a model for data integration systems of biomedical data applied to online genetic databases
American Medical Informatics Association Annual Symposium, 2001Co-Authors: Peter Mork, Alon Halevy, Alon Halevy, Peter TarczyhornochAbstract:We present a general model for data integration systems using a Mediated Schema to represent commonalities in the underlying sources. These sources are mapped to the Mediated Schema using source descriptions. Users can pose queries against the Mediated Schema, allowing the system to generate automatically a query plan that enumerates and ranks all possible ways in which the query could be answered. We apply this approach to the domain of online genetic databases, demonstrating the system s ability to answer relevant queries across multiple sources.
Peter Mork - One of the best experts on this subject based on the ideXlab platform.
-
DILS - The multiple roles of ontologies in the biomediator data integration system
Lecture Notes in Computer Science, 2005Co-Authors: Peter Mork, Ron Shaker, Peter Tarczy-hornochAbstract:BioMediator is a data integration system that provides a common interface to multiple Internet-accessible databases containing information about genetics and molecular biology. Ontologies play several important roles in the BioMediator system: First, ontologies of genetics and molecular biology can serve as data sources. In this role concepts from the ontologies are returned as results of queries. Second, queries are posed against a Mediated Schema, which is an ontology describing the domain of discourse. User queries are expressed using the concepts in the Mediated Schema to indicate which results to retrieve. Third, each data source is an instance of the system ontology. This ontology describes information about the data sources including how often the source is updated and by whom. Finally, we are exploring the use of ontologies as a mechanism for mapping data sources to the Mediated Schema. This will facilitate extending BioMediator from a centralized integration platform to a distributed network of peers.
-
AMIA - A rule driven bi-directional translation system for remapping queries and result sets between a Mediated Schema and heterogeneous data sources.
Proceedings. AMIA Symposium, 2002Co-Authors: Ron Shaker, Peter Mork, Matt Barclay, Peter Tarczy-hornochAbstract:As the number of online biomedical data sources increases, so too do the number of ways to access such data. The research described herein focuses on creating a data access system that provides bi-directional translation and mapping of data between heterogeneous databases and a Mediated Schema. Semantic mapping rules stored in a knowledge base are used by our generalized software to convert XML query results obtained from each data source to a common Schema representing a single ontology. We apply this approach to the domain of online genetic databases, demonstrating the system's scalability and integratability.
-
a rule driven bi directional translation system for remapping queries and result sets between a Mediated Schema and heterogeneous data sources
American Medical Informatics Association Annual Symposium, 2002Co-Authors: Ron Shaker, Peter Mork, Matt Barclay, Peter TarczyhornochAbstract:As the number of online biomedical data sources increases, so too do the number of ways to access such data. The research described herein focuses on creating a data access system that provides bi-directional translation and mapping of data between heterogeneous databases and a Mediated Schema. Semantic mapping rules stored in a knowledge base are used by our generalized software to convert XML query results obtained from each data source to a common Schema representing a single ontology. We apply this approach to the domain of online genetic databases, demonstrating the system's scalability and integratability.
-
a model for data integration systems of biomedical data applied to online genetic databases
American Medical Informatics Association Annual Symposium, 2001Co-Authors: Peter Mork, Alon Halevy, Alon Halevy, Peter TarczyhornochAbstract:We present a general model for data integration systems using a Mediated Schema to represent commonalities in the underlying sources. These sources are mapped to the Mediated Schema using source descriptions. Users can pose queries against the Mediated Schema, allowing the system to generate automatically a query plan that enumerates and ranks all possible ways in which the query could be answered. We apply this approach to the domain of online genetic databases, demonstrating the system s ability to answer relevant queries across multiple sources.
-
AMIA - A model for data integration systems of biomedical data applied to online genetic databases.
Proceedings. AMIA Symposium, 2001Co-Authors: Peter Mork, Alon Halevy, Peter Tarczy-hornochAbstract:We present a general model for data integration systems using a Mediated Schema to represent commonalities in the underlying sources. These sources are mapped to the Mediated Schema using source descriptions. Users can pose queries against the Mediated Schema, allowing the system to generate automatically a query plan that enumerates and ranks all possible ways in which the query could be answered. We apply this approach to the domain of online genetic databases, demonstrating the system s ability to answer relevant queries across multiple sources.
Marco A Casanova - One of the best experts on this subject based on the ideXlab platform.
-
modeling the Mediated Schema constraints
Engineering of Computer-Based Systems, 2010Co-Authors: Marco A Casanova, Tanara Lauschner, Luiz Andre Paes P Leme, Karin Breitman, Antonio L Furtado, Vânia Maria Ponte VidalAbstract:In this paper, we address the problem of modeling the constraints of a Mediated Schema. We argue that, from the point of view of an application that processes the results of queries defined over the Mediated Schema, the constraints should be modeled as the greatest lower bound of the constraints of the export Schemas, after appropriate translation to a common vocabulary. This assures that the application will correctly interpret query results.
-
ECBS - Modeling the Mediated Schema Constraints
2010 17th IEEE International Conference and Workshops on Engineering of Computer Based Systems, 2010Co-Authors: Marco A Casanova, Tanara Lauschner, Luiz Andre Paes P Leme, Karin Breitman, Antonio L Furtado, Vânia Maria Ponte VidalAbstract:In this paper, we address the problem of modeling the constraints of a Mediated Schema. We argue that, from the point of view of an application that processes the results of queries defined over the Mediated Schema, the constraints should be modeled as the greatest lower bound of the constraints of the export Schemas, after appropriate translation to a common vocabulary. This assures that the application will correctly interpret query results.
-
Revising the constraints of lightweight Mediated Schemas
Data & Knowledge Engineering, 2010Co-Authors: Marco A Casanova, Tanara Lauschner, Karin Breitman, Antonio L Furtado, Luiz André P. Paes Leme, Vânia Maria Ponte VidalAbstract:In this article, we address the problem of changing the constraints of a Mediated Schema to accommodate the set of constraints of a new export Schema. The relevance of this problem lies in that the constraints of a Mediated Schema capture the common semantics of the data sources and, as such, they must be maintained and made available to the users of the mediation environment. We first argue that such problem can be solved by computing the greatest lower bound of two theories induced by sets of constraints, defined as the intersection of the theories. Then, for an expressive family of conceptual Schemas, we show how to efficiently decide logical implication and how to compute the greatest lower bound of two theories induced by sets of constraints. The family of conceptual Schemas we work with partly corresponds to OWL Lite and supports the equivalent of named classes, datatype and object properties, minCardinalities and maxCardinalities, InverseFunctionalProperties, subset constraints, and disjointness constraints. Such Schemas are also sufficiently expressive to encode commonly used UML constructs, such as classes, attributes, binary associations without association classes, cardinality of binary associations, multiplicity of attributes, and ISA hierarchies with disjointness, but not with complete generalizations.
-
ER - A Strategy to Revise the Constraints of the Mediated Schema
Conceptual Modeling - ER 2009, 2009Co-Authors: Marco A Casanova, Tanara Lauschner, Luiz Andre Paes P Leme, Karin Breitman, Antonio L Furtado, Vânia Maria Ponte VidalAbstract:In this paper, we address the problem of changing the constraints of a Mediated Schema M to accommodate the constraints of a new export Schema E 0 . We first show how to translate the constraints of E 0 to the vocabulary of M , creating a set of constraints C 0 in such a way that the Schema mapping for E 0 is correct. Then, we show how to compute the new version of the constraints of M to accommodate C 0 so that all Schema mappings, including that for E 0 , are correct. We solve both problems for subset and cardinality constraints and specific families of Schema mappings.
-
SEW - Database Mediation Using Multi-agent Systems
2008 32nd Annual IEEE Software Engineering Workshop, 2008Co-Authors: Luiz André P. Paes Leme, Marco A Casanova, Karin Breitman, Antonio L FurtadoAbstract:This paper first proposes a multi-agent architecture to mediate access to data sources. The mediator follows the classical approach to process user queries. However, in the background, it post-processes query results to gradually construct matchings between the export Schemas and the Mediated Schema. The central theme of the paper is an extensional Schema matching strategy based on similarity functions. The paper concludes with experimental results that assess the quality of the matching strategy.