Data Quality Rule

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 12 Experts worldwide ranked by ideXlab platform

Gao Juntao - One of the best experts on this subject based on the ideXlab platform.

  • A Noval Data Quality Controlling and Assessing Model Based on Rules
    2010 Third International Symposium on Electronic Commerce and Security, 2010
    Co-Authors: Huang Gang, Gao Juntao
    Abstract:

    As a resource, Data is the base for information construction and application. According to the principle of “garbage in and garbage out”, it needs us to ensure Data reliability, no errors and accurately reflect the real situation to support the right decisions. However, due to various reasons, it leads to poor Quality of dirty Data in existing system business, while the dirty Data is an important factor which affects right decisions. For the above, in this paper, a metaData-based Data Quality Rule base is created for improving traditional Quality control model, a more practical application of the weighted assessment algorithm is proposed and a three-tier Data Quality assessment system model is constructed based on the study of definition and classification of Quality, assessment algorithm, metaData and the control theory. This model is confirmed to achieve comprehensive Quality of Data management and control in oilfield practical applications.

Thierno Mahamoudou Diallo - One of the best experts on this subject based on the ideXlab platform.

  • Discovering Data Quality Rules in a master Data management context
    2013
    Co-Authors: Thierno Mahamoudou Diallo
    Abstract:

    Dirty Data continues to be an important issue for companies. The Datawarehouse institute [Eckerson, 2002], [Rockwell, 2012] stated poor Data costs US businesses 611 billion dollars annually and erroneously priced Data in retail Databases costs US customers 2.5 billion each year. Data Quality becomes more and more critical. The Database community pays a particular attention to this subject where a variety of integrity constraints like Conditional Functional Dependencies (CFD) have been studied for Data cleaning. Repair techniques based on these constraints are precise to catch inconsistencies but are limited on how to exactly correct Data. Master Data brings a new alternative for Data cleaning with respect to it Quality property. Thanks to the growing importance of Master Data Management (MDM), a new class of Data Quality Rule known as Editing Rules (ER) tells how to fix errors, pointing which attributes are wrong and what values they should take. The intuition is to correct dirty Data using high Quality Data from the master. However, finding Data Quality Rules is an expensive process that involves intensive manual efforts. It remains unrealistic to rely on human designers. In this thesis, we develop pattern mining techniques for discovering ER from existing source relations with respect to master relations. In this set- ting, we propose a new semantics of ER taking advantage of both source and master Data. Thanks to the semantics proposed in term of satisfaction, the discovery problem of ER turns out to be strongly related to the discovery of both CFD and one-to-one correspondences between sources and target attributes. We first attack the problem of discovering CFD. We concentrate our attention to the particular class of constant CFD known as very expressive to detect inconsistencies. We extend some well know concepts introduced for traditional Functional Dependencies to solve the discovery problem of CFD. Secondly, we propose a method based on INclusion Dependencies to extract one-to-one correspondences from source to master attributes before automatically building ER. Finally we propose some heuristics of applying ER to clean Data. We have implemented and evaluated our techniques on both real life and synthetic Databases. Experiments show both the feasibility, the scalability and the robustness of our proposal.

Huang Gang - One of the best experts on this subject based on the ideXlab platform.

  • A Noval Data Quality Controlling and Assessing Model Based on Rules
    2010 Third International Symposium on Electronic Commerce and Security, 2010
    Co-Authors: Huang Gang, Gao Juntao
    Abstract:

    As a resource, Data is the base for information construction and application. According to the principle of “garbage in and garbage out”, it needs us to ensure Data reliability, no errors and accurately reflect the real situation to support the right decisions. However, due to various reasons, it leads to poor Quality of dirty Data in existing system business, while the dirty Data is an important factor which affects right decisions. For the above, in this paper, a metaData-based Data Quality Rule base is created for improving traditional Quality control model, a more practical application of the weighted assessment algorithm is proposed and a three-tier Data Quality assessment system model is constructed based on the study of definition and classification of Quality, assessment algorithm, metaData and the control theory. This model is confirmed to achieve comprehensive Quality of Data management and control in oilfield practical applications.

Orlando Belo - One of the best experts on this subject based on the ideXlab platform.

  • Using inheritance in a metaData based approach to Data Quality assessment
    Proceeding of the first international workshop on Model driven service engineering and data quality and security - MoSE+DQS '09, 2009
    Co-Authors: José Farinha, Maria José Trigueiros, Orlando Belo
    Abstract:

    Currently available Data Quality tools provide development environments that significantly decrease the effort in dealing with common Data problems, such as those related with attribute domain validation, syntax checking, or value matching against a reference master Data repository. On the contrary, more complex and specific Data Quality functionalities, whose requirements usually derive from application domain business Rules, have to be developed from scratch, usually leading to high costs of development and maintenance. This paper introduces the concept of inheritance in a metaData-driven approach to simplified Data Quality Rule management. The approach is based on the belief that even complex Data Quality Rules very often adhere to recurring patterns that can be encoded and encapsulated as reusable, abstract templates. The approach is supported by a metamodel developed on top of OMG's Common Warehouse Metamodel, herein extended with the ability to derive new Rule patterns from existing ones, through inheritance. The inheritance metamodel is presented in UML and its application is illustrated with a running example.

José Farinha - One of the best experts on this subject based on the ideXlab platform.

  • Using inheritance in a metaData based approach to Data Quality assessment
    Proceeding of the first international workshop on Model driven service engineering and data quality and security - MoSE+DQS '09, 2009
    Co-Authors: José Farinha, Maria José Trigueiros, Orlando Belo
    Abstract:

    Currently available Data Quality tools provide development environments that significantly decrease the effort in dealing with common Data problems, such as those related with attribute domain validation, syntax checking, or value matching against a reference master Data repository. On the contrary, more complex and specific Data Quality functionalities, whose requirements usually derive from application domain business Rules, have to be developed from scratch, usually leading to high costs of development and maintenance. This paper introduces the concept of inheritance in a metaData-driven approach to simplified Data Quality Rule management. The approach is based on the belief that even complex Data Quality Rules very often adhere to recurring patterns that can be encoded and encapsulated as reusable, abstract templates. The approach is supported by a metamodel developed on top of OMG's Common Warehouse Metamodel, herein extended with the ability to derive new Rule patterns from existing ones, through inheritance. The inheritance metamodel is presented in UML and its application is illustrated with a running example.