Foreign Key Value

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 11106 Experts worldwide ranked by ideXlab platform

Carlos Ordonez - One of the best experts on this subject based on the ideXlab platform.

  • Extended aggregations for databases with referential integrity issues
    Data & Knowledge Engineering, 2010
    Co-Authors: Javier García-garcía, Carlos Ordonez
    Abstract:

    Querying inconsistent databases remains a broad and difficult problem. In this work, we study how to improve aggregations computed on databases with referential errors in the context of database integration, where each source database has different tables, columns with similar content across multiple databases, but different referential integrity constraints. Thus, a query in an integrated database may involve tables and columns with referential integrity errors. In a data warehouse, even though the ETL processes fix referential integrity errors, this is generally done by inserting ''dummy'' records into the dimension tables corresponding to such invalid Foreign Keys, thereby artificially enforcing referential integrity. When two tables are joined and aggregations are computed, rows with an invalid or null Foreign Key Value are skipped, effectively eliminating potentially valuable information. With that motivation in mind, we extend SQL aggregate functions computed over tables with referential integrity issues to return complete answer sets in the sense that no row is excluded. We associate to each referenced Key in the dimension table, a probability that invalid or null Foreign Keys refer to it. Our main idea is to compute aggregations over joined tables including rows with invalid or null references by distributing their contribution to aggregation totals, based on probabilities computed over correct Foreign Keys. Experiments with real and synthetic databases evaluate the usefulness, accuracy and performance of our extended aggregations.

  • estimating and bounding aggregations in databases with referential integrity errors
    Data Warehousing and OLAP, 2008
    Co-Authors: Javier Garciagarcia, Carlos Ordonez
    Abstract:

    Database integration builds on tables coming from multiple databases by creating a single view of all these data. Each database has different tables, columns with similar content across databases and different referential integrity constraints. Thus, a query in an integrated database is likely to involve tables and columns with referential integrity errors. In a data warehouse environment, even though the ETL processes take care of the referential integrity errors, in many scenarios this is generally done by including 'dummy' records in the dimension tables used to relate to the fact tables with referential errors. When two tables are joined, and aggregations are computed, the tuples with an undefined Foreign Key Value are aggregated in a group marked as undefined effectively discarding potentially valuable information. With that motivation in mind, we extend aggregate functions computed over tables with referential integrity errors on OLAP databases to return complete answer sets in the sense that no tuple is excluded. We associate to each valid reference, the probability that an invalid reference may actually be a certain correct reference. The main idea of our work is that in certain contexts, it is possible to use tuples with invalid references by taking into account the probability that an invalid reference actually be a certain correct reference. This way, improved answer sets are obtained from aggregate queries in settings where a database violates referential integrity constraints.

Javier García-garcía - One of the best experts on this subject based on the ideXlab platform.

  • Extended aggregations for databases with referential integrity issues
    Data & Knowledge Engineering, 2010
    Co-Authors: Javier García-garcía, Carlos Ordonez
    Abstract:

    Querying inconsistent databases remains a broad and difficult problem. In this work, we study how to improve aggregations computed on databases with referential errors in the context of database integration, where each source database has different tables, columns with similar content across multiple databases, but different referential integrity constraints. Thus, a query in an integrated database may involve tables and columns with referential integrity errors. In a data warehouse, even though the ETL processes fix referential integrity errors, this is generally done by inserting ''dummy'' records into the dimension tables corresponding to such invalid Foreign Keys, thereby artificially enforcing referential integrity. When two tables are joined and aggregations are computed, rows with an invalid or null Foreign Key Value are skipped, effectively eliminating potentially valuable information. With that motivation in mind, we extend SQL aggregate functions computed over tables with referential integrity issues to return complete answer sets in the sense that no row is excluded. We associate to each referenced Key in the dimension table, a probability that invalid or null Foreign Keys refer to it. Our main idea is to compute aggregations over joined tables including rows with invalid or null references by distributing their contribution to aggregation totals, based on probabilities computed over correct Foreign Keys. Experiments with real and synthetic databases evaluate the usefulness, accuracy and performance of our extended aggregations.

Javier Garciagarcia - One of the best experts on this subject based on the ideXlab platform.

  • estimating and bounding aggregations in databases with referential integrity errors
    Data Warehousing and OLAP, 2008
    Co-Authors: Javier Garciagarcia, Carlos Ordonez
    Abstract:

    Database integration builds on tables coming from multiple databases by creating a single view of all these data. Each database has different tables, columns with similar content across databases and different referential integrity constraints. Thus, a query in an integrated database is likely to involve tables and columns with referential integrity errors. In a data warehouse environment, even though the ETL processes take care of the referential integrity errors, in many scenarios this is generally done by including 'dummy' records in the dimension tables used to relate to the fact tables with referential errors. When two tables are joined, and aggregations are computed, the tuples with an undefined Foreign Key Value are aggregated in a group marked as undefined effectively discarding potentially valuable information. With that motivation in mind, we extend aggregate functions computed over tables with referential integrity errors on OLAP databases to return complete answer sets in the sense that no tuple is excluded. We associate to each valid reference, the probability that an invalid reference may actually be a certain correct reference. The main idea of our work is that in certain contexts, it is possible to use tuples with invalid references by taking into account the probability that an invalid reference actually be a certain correct reference. This way, improved answer sets are obtained from aggregate queries in settings where a database violates referential integrity constraints.