Schema Mapping

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 9513 Experts worldwide ranked by ideXlab platform

Mauricio A. Hernández - One of the best experts on this subject based on the ideXlab platform.

  • clio Schema Mapping creation and data exchange
    Conceptual Modeling: Foundations and Applications, 2009
    Co-Authors: Ronald Fagin, Renée J. Miller, Mauricio A. Hernández, Lucian Popa, Laura M. Haas, Yannis Velegrakis
    Abstract:

    The Clio project provides tools that vastly simplify information integration. Information integration requires data conversions to bring data in different representations into a common form. Key contributions of Clio are the definition of non-procedural Schema Mappings to describe the relationship between data in heterogeneous Schemas, a new paradigm in which we view the Mapping creation process as one of query discovery, and algorithms for automatically generating queries for data transformation from the Mappings. Clio provides algorithms to address the needs of two major information integration problems, namely, data integration and data exchange . In this chapter, we present our algorithms for both Schema Mapping creation via query discovery, and for query generation for data exchange. These algorithms can be used in pure relational, pure XML, nested relational, or mixed relational and nested contexts.

  • Orchid: Integrating Schema Mapping and ETL
    2008 IEEE 24th International Conference on Data Engineering, 2008
    Co-Authors: Stefan Deßloch, Ryan Wisnesky, Ahmed Radwan, Mauricio A. Hernández, Jindan Zhou
    Abstract:

    This paper describes Orchid, a system that converts declarative Mapping specifications into data flow specifications (ETL jobs) and vice versa. Orchid provides an abstract operator model that serves as a common model for both transformation paradigms; both Mappings and ETL jobs are transformed into instances of this common model. As an additional benefit, instances of this common model can be optimized and deployed into multiple target environments. Orchid is being deployed in FastTrack, a data transformation toolkit in IBM Information Server.

  • Clip: a Visual Language for Explicit Schema Mappings
    2008 IEEE 24th International Conference on Data Engineering, 2008
    Co-Authors: Alessandro Raffio, Paolo Papotti, Daniele Braga, Stefano Ceri, Mauricio A. Hernández
    Abstract:

    Many data integration solutions in the market today include tools for Schema Mapping, to help users visually relate elements of different Schemas. Schema elements are connected with lines, which are interpreted as Mappings, i.e. high-level logical expressions capturing the relationship between source and target data-sets; these are compiled into queries and programs that convert source-side data instances into target-side instances. This paper describes Clip, an XML Schema Mapping tool distinguished from existing tools in that Mappings explicitly specify structural transformations in addition to value couplings. Since clip maps hierarchical XML Schemas, lines appear naturally nested. We describe the transformation semantics associated with our "lines" and how they combine to form Mappings that are more expressive than those generated by Clio, a well-known Mapping tool. Further, we extend Clio's Mapping generation algorithms to generate Clip's Mappings.

  • nested Mappings Schema Mapping reloaded
    Very Large Data Bases, 2006
    Co-Authors: Ariel Fuxman, Renée J. Miller, Paolo Papotti, H. Ho, Mauricio A. Hernández, Lucian Popa
    Abstract:

    Many problems in information integration rely on specifications, called Schema Mappings, that model the relationships between Schemas. Schema Mappings for both relational and nested data are well-known. In this work, we present a new formalism for Schema Mapping that extends these existing formalisms in two significant ways. First, our nested Mappings allow for nesting and correlation of Mappings. This results in a natural programming paradigm that often yields more accurate specifications. In particular, we show that nested Mappings can naturally preserve correlations among data that existing Mapping formalisms cannot. We also show that using nested Mappings for purposes of exchanging data from a source to a target will result in less redundancy in the target data. The second extension to the Mapping formalism is the ability to express, in a declarative way, grouping and data merging semantics. This semantics can be easily changed and customized to the integration task at hand. We present a new algorithm for the automatic generation of nested Mappings from Schema matchings (that is, simple element-to-element correspondences between Schemas). We have implemented this algorithm, along with algorithms for the generation of transformation queries (e.g., XQuery) based on the nested Mapping specification. We show that the generation algorithms scale well to large, highly nested Schemas. We also show that using nested Mappings in data exchange can drastically reduce the execution cost of producing a target instance, particularly over large data sources, and can also dramatically improve the quality of the generated data.

  • clio a semi automatic tool for Schema Mapping
    International Conference on Management of Data, 2001
    Co-Authors: Mauricio A. Hernández, Renée J. Miller, Laura M. Haas
    Abstract:

    We consider the integration requirements of modern data intensive applications including data warehousing, global information systems and electronic commerce. At the heart of these requirements lies the Schema Mapping problem in which a source (legacy) database must be mapped into a different, but xed, target Schema. The goal of Schema Mapping is the discovery of a query or set of queries to map source databases into the new structure. We demonstrate Clio, a new semi-automated tool for creating Schema Mappings. Clio employs a Mapping-by-example paradigm that relies on the use of value correspondences describing how a value of a target attribute can be created from a set of values of source attributes. A typical session with Clio starts with the user loading a source and a target Schema into the system. These Schemas are read from either an underlying Object-Relational database or from an XML le with an associated XML Schema. Users can then draw value correspondences Mapping source attributes into target attributes. Clio's Mapping engine incrementally produces the SQL queries that realize the Mappings implied by the correspondences. Clio provides Schema and data browsers and other feedback to allow users to understand the Mapping produced. Entering and manipulating value correspondences can be done in two modes. In the Schema View mode, users see a representation of the source and target Schema and create value correspondences by selecting Schema objects from the source and Mapping them to a target attribute. The alternative Data View mode o ers a WYSIWYG interface for the Mapping process that displays example data for both the source and target tables [3]. Users may add and delete value correspondences from this view and immediately see the changes re ected in the resulting target tuples. Also, the Data View mode helps users navigate through alternative Mappings, understanding the often subtle di erences between them. For example, in some cases, changing a join from an inner join to an outer join may dramatically change the resulting table. In other cases, the same change may have no e ect due to constraints that hold on the source

Lise Getoor - One of the best experts on this subject based on the ideXlab platform.

  • A Collective, Probabilistic Approach to Schema Mapping Using Diverse Noisy Evidence
    IEEE Transactions on Knowledge and Data Engineering, 2019
    Co-Authors: Angelika Kimmig, Alex Memory, Ren´ee J. Miller, Lise Getoor
    Abstract:

    We propose a probabilistic approach to the problem of Schema Mapping. Our approach is declarative, scalable, and extensible. It builds upon recent results in both Schema Mapping and probabilistic reasoning and contributes novel techniques in both fields. We introduce the problem of Schema Mapping selection, that is, choosing the best Mapping from a space of potential Mappings, given both metadata constraints and a data example. As selection has to reason holistically about the inputs and the dependencies between the chosen Mappings, we define a new Schema Mapping optimization problem which captures interactions between Mappings as well as inconsistencies and incompleteness in the input. We then introduce Collective Mapping Discovery (CMD), our solution to this problem using state-of-the-art probabilistic reasoning techniques. Our evaluation on a wide range of integration scenarios, including several real-world domains, demonstrates that CMD effectively combines data and metadata information to infer highly accurate Mappings even with significant levels of noise.

  • A Collective, Probabilistic Approach to Schema Mapping
    2017 IEEE 33rd International Conference on Data Engineering (ICDE), 2017
    Co-Authors: Angelika Kimmig, Alex Memory, Ren´ee J. Miller, Lise Getoor
    Abstract:

    We propose a probabilistic approach to the problem of Schema Mapping. Our approach is declarative, scalable, and extensible. It builds upon recent results in both Schema Mapping and probabilistic reasoning and contributes novel techniques in both fields. We introduce the problem of Mapping selection, that is, choosing the best Mapping from a space of potential Mappings, given both metadata constraints and a data example. As selection has to reason holistically about the inputs and the dependencies between the chosen Mappings, we define a new Schema Mapping optimization problem which captures interactions between Mappings. We then introduce Collective Mapping Discovery (CMD), our solution to this problem using stateof- the-art probabilistic reasoning techniques, which allows for inconsistencies and incompleteness. Using hundreds of realistic integration scenarios, we demonstrate that the accuracy of CMD is more than 33% above that of metadata-only approaches already for small data examples, and that CMD routinely finds perfect Mappings even if a quarter of the data is inconsistent.

Phokion G Kolaitis - One of the best experts on this subject based on the ideXlab platform.

  • reflections on Schema Mappings data exchange and metadata management
    Symposium on Principles of Database Systems, 2018
    Co-Authors: Phokion G Kolaitis
    Abstract:

    A Schema Mapping is a high-level specification of the relationship between two database Schemas. For the past fifteen years, Schema Mappings have played an essential role in the modeling and analysis of data exchange, data integration, and related data inter-operability tasks. The aim of this talk is to critically reflect on the body of work carried out to date, describe some of the persisting challenges, and suggest directions for future work. The first part of the talk will focus on Schema-Mapping languages, especially on the language of GLAV (global-and-local as view) Mappings and its two main sublanguages, the language of GAV (global-as-view) Mappings and the language of LAV (local-as-view) Mappings. After highlighting the fundamental structural properties of these languages, we will discuss how structural properties can actually characterize Schema-Mapping languages. The second part of the talk will focus on metadata management by considering operators on Schema Mappings, such as the composition operator and the inverse operator. We will discuss why richer languages are needed to express these operators, and will illustrate some of their uses in Schema-Mapping evolution. The third and final part of the talk will focus on the derivation of Schema Mappings from semantic information. In particular, we will discuss a variety of approaches for deriving Schema Mappings from data examples, including casting the derivation of Schema Mappings as an optimization problem and as a learning problem.

  • approximation algorithms for Schema Mapping discovery from data examples
    International Conference on Management of Data, 2017
    Co-Authors: Balder Ten Cate, Phokion G Kolaitis, Kun Qian
    Abstract:

    In recent years, data examples have been at the core of several different approaches to Schema-Mapping design. In particular, Gottlob and Senellart introduced a framework for Schema-Mapping discovery from a single data example, in which the derivation of a Schema Mapping is cast as an optimization problem. Our goal is to refine and study this framework in more depth. Among other results, we design a polynomial-time log(n)-approximation algorithm for computing optimal Schema Mappings from a given set of data examples (where n is the combined size of the given data examples) for a restricted class of Schema Mappings; moreover, we show that this approximation ratio cannot be improved. In addition to the complexity-theoretic results, we implemented the aforementioned log(n)-approximation algorithm and carried out an experimental evaluation in a real-world Mapping scenario.

  • Schema Mappings a case of logical dynamics in database theory
    Johan van Benthem on Logic and Information Dynamics, 2014
    Co-Authors: Balder Ten Cate, Phokion G Kolaitis
    Abstract:

    A Schema Mapping is a high-level specification of the structural relationships between two database Schemas. This specification is expressed in a Schema-Mapping language, which is typically a fragment of first-order logic or second-order logic. Schema Mappings have played an essential role in the study of important data-interoperability tasks, such as data integration and data exchange. In this chapter, we examine Schema Mappings as a case of logical dynamics in action. We provide a self-contained introduction to this area of research in the context of logic and databases, and focus on some of the concepts and results that may be of particular interest to the readers of this volume. After a basic introduction to Schema Mappings and Schema-Mapping languages, we discuss a series of results concerning fundamental structural properties of Schema Mappings. We then show that these structural properties can be used to obtain characterizations of various Schema-Mapping languages, in the spirit of abstract model theory. We conclude this chapter by highlighting the surprisingly subtle picture regarding compositions of Schema Mappings and the languages needed to express them.

  • local transformations and conjunctive query equivalence
    Symposium on Principles of Database Systems, 2012
    Co-Authors: Ronald Fagin, Phokion G Kolaitis
    Abstract:

    Over the past several decades, the study of conjunctive queries has occupied a central place in the theory and practice of database systems. In recent years, conjunctive queries have played a prominent role in the design and use of Schema Mappings for data integration and data exchange tasks. In this paper, we investigate several different aspects of conjunctive-query equivalence in the context of Schema Mappings and data exchange. In the first part of the paper, we introduce and study a notion of a local transformation between database instances that is based on conjunctive-query equivalence. We show that the chase procedure for GLAV Mappings (that is, Schema Mappings specified by source-to-target tuple-generating dependencies) is a local transformation with respect to conjunctive-query equivalence. This means that the chase procedure preserves bounded conjunctive-query equivalence, that is, if two source instances are indistinguishable using conjunctive queries of a sufficiently large size, then the target instances obtained by chasing these two source instances are also indistinguishable using conjunctive queries of a given size. Moreover, we obtain polynomial bounds on the level of indistinguishability between source instances needed to guarantee indistinguishability between the target instances produced by the chase. The locality of the chase extends to Schema Mappings specified by a second-order tuple-generating dependency (SO tgd), but does not hold for Schema Mappings whose specification includes target constraints. In the second part of the paper, we take a closer look at the composition of two GLAV Mappings. In particular, we break GLAV Mappings into a small number of well-studied classes (including LAV and GAV), and complete the picture as to when the composition of Schema Mappings from these various classes can be guaranteed to be a GLAV Mapping, and when they can be guaranteed to be conjunctive-query equivalent to a GLAV Mapping. We also show that the following problem is decidable: given a Schema Mapping specified by an SO tgd and a GLAV Mapping, are they conjunctive-query equivalent? In contrast, the following problem is known to be undecidable: given a Schema Mapping specified by an SO tgd and a GLAV Mapping, are they logically equivalent?

  • Schema Mapping evolution through composition and inversion
    Schema Matching and Mapping, 2011
    Co-Authors: Ronald Fagin, Phokion G Kolaitis, Lucian Popa
    Abstract:

    Mappings between different representations of data are the essential building blocks for many information integration tasks. A Schema Mapping is a high-level specification of the relationship between two Schemas, and represents a useful abstraction that specifies how the data from a source format can be transformed into a target format. The development of Schema Mappings is laborious and time consuming, even in the presence of tools that facilitate this development. At the same time, Schema evolution inevitably causes the invalidation of the existing Schema Mappings (since their Schemas change). Providing tools and methods that can facilitate the adaptation and reuse of the existing Schema Mappings in the context of the new Schemas is an important research problem. In this chapter, we show how two fundamental operators on Schema Mappings, namely composition and inversion, can be used to address the Mapping adaptation problem in the context of Schema evolution. We illustrate the applicability of the two operators in various concrete Schema evolution scenarios, and we survey the most important developments on the semantics, algorithms, and implementation of composition and inversion. We also discuss the main research questions that still remain to be addressed.

Renée J. Miller - One of the best experts on this subject based on the ideXlab platform.

  • clio Schema Mapping creation and data exchange
    Conceptual Modeling: Foundations and Applications, 2009
    Co-Authors: Ronald Fagin, Renée J. Miller, Mauricio A. Hernández, Lucian Popa, Laura M. Haas, Yannis Velegrakis
    Abstract:

    The Clio project provides tools that vastly simplify information integration. Information integration requires data conversions to bring data in different representations into a common form. Key contributions of Clio are the definition of non-procedural Schema Mappings to describe the relationship between data in heterogeneous Schemas, a new paradigm in which we view the Mapping creation process as one of query discovery, and algorithms for automatically generating queries for data transformation from the Mappings. Clio provides algorithms to address the needs of two major information integration problems, namely, data integration and data exchange . In this chapter, we present our algorithms for both Schema Mapping creation via query discovery, and for query generation for data exchange. These algorithms can be used in pure relational, pure XML, nested relational, or mixed relational and nested contexts.

  • A Semantic Approach to Discovering Schema Mapping Expressions
    2007 IEEE 23rd International Conference on Data Engineering, 2007
    Co-Authors: Yuan An, Renée J. Miller, Alex Borgida, John Mylopoulos
    Abstract:

    In many applications it is important to find a meaningful relationship between the Schemas of a source and target database. This relationship is expressed in terms of declarative logical expressions called Schema Mappings. The more successful previous solutions have relied on inputs such as simple element correspondences between Schemas in addition to local Schema constraints such as keys and referential integrity. In this paper, we investigate the use of an alternate source of information about Schemas, namely the presumed presence of semantics for each table, expressed in terms of a conceptual model (CM) associated with it. Our approach first compiles each CM into a graph and represents each table's semantics as a subtree in it. We then develop algorithms for discovering subgraphs that are plausible connections between those concepts/nodes in the CM graph that have attributes participating in element correspondences. A conceptual Mapping candidate is now a pair of source and target subgraphs which are semantically similar. At the end, these are converted to expressions at the database level. We offer experimental results demonstrating that, for test cases of non-trivial Mapping expressions involving Schemas from a number of domains, the "semantic" approach outperforms the traditional technique in terms of recall and especially precision.

  • retrospective on clio Schema Mapping and data exchange in practice
    Description Logics, 2007
    Co-Authors: Renée J. Miller
    Abstract:

    Clio is a joint research project between the University of Toronto and IBM Almaden Research Center started in 1999 to address both foundational and systems issues related to the management of heterogeneous data. In this talk, I will take a look back over the last eight years of this project to review its achievements, the lessons learned, and the challenges that remain.

  • nested Mappings Schema Mapping reloaded
    Very Large Data Bases, 2006
    Co-Authors: Ariel Fuxman, Renée J. Miller, Paolo Papotti, H. Ho, Mauricio A. Hernández, Lucian Popa
    Abstract:

    Many problems in information integration rely on specifications, called Schema Mappings, that model the relationships between Schemas. Schema Mappings for both relational and nested data are well-known. In this work, we present a new formalism for Schema Mapping that extends these existing formalisms in two significant ways. First, our nested Mappings allow for nesting and correlation of Mappings. This results in a natural programming paradigm that often yields more accurate specifications. In particular, we show that nested Mappings can naturally preserve correlations among data that existing Mapping formalisms cannot. We also show that using nested Mappings for purposes of exchanging data from a source to a target will result in less redundancy in the target data. The second extension to the Mapping formalism is the ability to express, in a declarative way, grouping and data merging semantics. This semantics can be easily changed and customized to the integration task at hand. We present a new algorithm for the automatic generation of nested Mappings from Schema matchings (that is, simple element-to-element correspondences between Schemas). We have implemented this algorithm, along with algorithms for the generation of transformation queries (e.g., XQuery) based on the nested Mapping specification. We show that the generation algorithms scale well to large, highly nested Schemas. We also show that using nested Mappings in data exchange can drastically reduce the execution cost of producing a target instance, particularly over large data sources, and can also dramatically improve the quality of the generated data.

  • clio a semi automatic tool for Schema Mapping
    International Conference on Management of Data, 2001
    Co-Authors: Mauricio A. Hernández, Renée J. Miller, Laura M. Haas
    Abstract:

    We consider the integration requirements of modern data intensive applications including data warehousing, global information systems and electronic commerce. At the heart of these requirements lies the Schema Mapping problem in which a source (legacy) database must be mapped into a different, but xed, target Schema. The goal of Schema Mapping is the discovery of a query or set of queries to map source databases into the new structure. We demonstrate Clio, a new semi-automated tool for creating Schema Mappings. Clio employs a Mapping-by-example paradigm that relies on the use of value correspondences describing how a value of a target attribute can be created from a set of values of source attributes. A typical session with Clio starts with the user loading a source and a target Schema into the system. These Schemas are read from either an underlying Object-Relational database or from an XML le with an associated XML Schema. Users can then draw value correspondences Mapping source attributes into target attributes. Clio's Mapping engine incrementally produces the SQL queries that realize the Mappings implied by the correspondences. Clio provides Schema and data browsers and other feedback to allow users to understand the Mapping produced. Entering and manipulating value correspondences can be done in two modes. In the Schema View mode, users see a representation of the source and target Schema and create value correspondences by selecting Schema objects from the source and Mapping them to a target attribute. The alternative Data View mode o ers a WYSIWYG interface for the Mapping process that displays example data for both the source and target tables [3]. Users may add and delete value correspondences from this view and immediately see the changes re ected in the resulting target tuples. Also, the Data View mode helps users navigate through alternative Mappings, understanding the often subtle di erences between them. For example, in some cases, changing a join from an inner join to an outer join may dramatically change the resulting table. In other cases, the same change may have no e ect due to constraints that hold on the source

Jiu-jye Chen - One of the best experts on this subject based on the ideXlab platform.

  • a service framework for multi tenant enterprise application in saas environments
    International Conference on Software Engineering, 2014
    Co-Authors: Chun-feng Liao, Kung Chen, Jiu-jye Chen
    Abstract:

    In recent years, Software as a service (SaaS), a service model for cloud computing, has received a lot of attention. As designing a multi-tenant enterprise application in SaaS environments is a non-trival task, we propose a service framework to deal with three common issues for designing multi-tenant enterprise SaaS applications: tenant context storage and propagation, Schema-Mapping, and the integration of ORM framework. A prototype and a sample SaaS application have been implemented to verify the feasibility of our framework. In addition, two tenant-specific virtual applications are constructed to demonstrate multi-tenancy. Finally, we conduct a set of experiments to assess the overheads of making an enterprise application multi-tenant enabled.

  • Toward a tenant-aware query rewriting engine for Universal Table Schema-Mapping
    4th IEEE International Conference on Cloud Computing Technology and Science Proceedings, 2012
    Co-Authors: Chun-feng Liao, Kung Chen, Jiu-jye Chen
    Abstract:

    In software as a service (SaaS) environments, designing a multi-tenant data architecture that supports shared database with custom extension is a non-trivial task. A general approach to support such an architecture is a middleware-level facility that supports the Mapping of multiple single-tenant logical Schemas in the application to one multi-tenant physical Schema in the database. In this paper we follow this approach and report our preliminary results on the design and analysis of a query rewriting engine that can transparently transform tenant-specific logical queries into corresponding physical queries for Universal Table, a widely adopted and industry verified Schema-Mapping technique. A prototype and a sample SaaS application are implemented to verify the feasibility of the design of the query rewriting engine. Besides, performance analysis results that can be used to predict the overhead of Schema-Mapping in the engine are also reported.