Data Quality Measurement

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 150 Experts worldwide ranked by ideXlab platform

Antoon Bronselaer - One of the best experts on this subject based on the ideXlab platform.

  • Measuring Data Quality in Information Systems research
    Decision Support Systems, 2019
    Co-Authors: Yoram Timmerman, Antoon Bronselaer
    Abstract:

    Abstract Although contemporary research relies to a large extent on Data, Data Quality in Information Systems research is a subject that has not received much attention until now. In this paper, a framework is presented for the Measurement of scientific Data Quality using the principles of rule-based Measurement. The proposed framework is capable of handling Data Quality problems due to both incorrect execution and incorrect description of Data collection and validation processes. It is then argued that uncertainty can arise during the Measurement, which complicates Data Quality assessment. The framework is therefore extended to handle uncertainty about the truth value of predicates. Instead of a numerical Quality level, Data Quality is then expressed as either a probability distribution or a possibility distribution over the ordinal Quality scale. Finally, it is also shown how Quality thresholds can be formulated based on the results of the Quality Measurement. The usefulness of the proposed framework is illustrated throughout the paper with an example of the construction of a possible survey Data Quality Measurement system and, subsequently, the application of that system on a realistic example.

  • operational Measurement of Data Quality
    International Conference Information Processing, 2018
    Co-Authors: Antoon Bronselaer, Joachim Nielandt, Toon Boeckling, Guy De Tre
    Abstract:

    In this paper, an alternative view on Measurement of Data Quality is proposed. Current procedures for Data Quality Measurement provide information about the extent to which Data misrepresent reality. These procedures are descriptive in the sense that they provide us numerical information about the state of Data. In many cases, this information is not sufficient to know whether Data is fit for the task it was meant for. To bridge that gap, we propose a procedure that measures the operational characteristics of Data. In this paper, we devise such a procedure by measuring the cost it takes to make Data fit for use. We lay out the basics of this procedure and then provide more details on two essential components: tasks and transformation functions.

  • IPMU (3) - Operational Measurement of Data Quality
    Communications in Computer and Information Science, 2018
    Co-Authors: Antoon Bronselaer, Joachim Nielandt, Toon Boeckling, Guy De Tre
    Abstract:

    In this paper, an alternative view on Measurement of Data Quality is proposed. Current procedures for Data Quality Measurement provide information about the extent to which Data misrepresent reality. These procedures are descriptive in the sense that they provide us numerical information about the state of Data. In many cases, this information is not sufficient to know whether Data is fit for the task it was meant for. To bridge that gap, we propose a procedure that measures the operational characteristics of Data. In this paper, we devise such a procedure by measuring the cost it takes to make Data fit for use. We lay out the basics of this procedure and then provide more details on two essential components: tasks and transformation functions.

  • An incremental approach for Data Quality Measurement with insufficient information
    International Journal of Approximate Reasoning, 2018
    Co-Authors: Antoon Bronselaer, Joachim Nielandt, G. De Tré
    Abstract:

    Abstract Recently, a fundamental study on Measurement of Data Quality introduced an ordinal-scaled procedure of Measurement. Besides the pure ordinal information about the level of Quality, numerical information is induced when considering uncertainty involved during Measurement. In the case where uncertainty is modelled as probability, this numerical information is ratio-scaled. An essential property of the mentioned approach is that the application of a measure on a large collection of Data can be represented efficiently in the sense that (i) the representation has a low storage complexity and (ii) it can be updated incrementally when new Data are observed. However, this property only holds when the evaluation of predicates is clear and does not deal with uncertainty. For some dimensions of Quality, this assumption is far too strong and uncertainty comes into play almost naturally. In this paper, we investigate how the presence of uncertainty influences the efficiency of a Measurement procedure. Hereby, we focus specifically on the case where uncertainty is caused by insufficient information and is thus modelled by means of possibility theory. It is shown that the amount of Data that reaches a certain level of Quality, can be summarized as a possibility distribution over the set of natural numbers. We investigate an approximation of this distribution that has a controllable loss of information, allows for incremental updates and exhibits a low space complexity.

  • A Measure-Theoretic Foundation for Data Quality
    IEEE Transactions on Fuzzy Systems, 2018
    Co-Authors: Antoon Bronselaer
    Abstract:

    In this paper, a novel framework for Data Quality Measurement is proposed by adopting a measure-theoretic treatment of the problem. Instead of considering a specific setting in which Quality must be assessed, our approach departs more formally from the concept of Measurement. The basic assumption of the framework is that the highest possible Quality can be described by means of a set of predicates. Quality of Data is then measured by evaluating those predicates and by combining their evaluations. This combination is based on a capacity function (i.e., a fuzzy measure) that models for each combination of predicates the capacity with respect to the Quality of the Data. It is shown that expression of Quality on an ordinal scale entails a high degree of interpretation and a compact representation of the Measurement function. Within this purely ordinal framework for Measurement, it is shown that reasoning about Quality beyond the ordinal level naturally originates from the uncertainty about predicate evaluation. It is discussed how the proposed framework is positioned with respect to other approaches with particular attention to aggregation of Measurements. The practical usability of the framework is discussed for several well known dimensions of Data Quality and demonstrated in a use-case study about clinical trials.

Kathy Sonderer - One of the best experts on this subject based on the ideXlab platform.

  • Measuring Manufacturing Test Data Analysis Quality
    2018 IEEE AUTOTESTCON, 2018
    Co-Authors: Andrew Burkhardt, Sheila Berryman, Ashley Brio, Susan Ferkau, Gloria Hubner, Susan Mittman, Kevin Lynch, Kathy Sonderer
    Abstract:

    Manufacturing test Data volumes are constantly increasing. While there has been extensive focus in the literature on big Data processing, less focus has existed on Data Quality, and considerably less focus has been placed specifically on manufacturing test Data Quality. This paper presents a fully automated test Data Quality Measurement developed by the authors to facilitate analysis of manufacturing test operations, resulting in a single number used to compare manufacturing test Data Quality across programs and factories, and focusing effort cost-effectively. The automation enables program and factory users to see, understand, and improve their test Data Quality directly. Immediate improvements in test Data Quality speed manufacturing test operation analysis, reducing elapsed time and overall spend in test operations. Data Quality has significant financial impacts to businesses [1]. While manufacturing cost models are well understood, Data Quality cost models are less well understood (see Eppler & Helfert [2] who review manufacturing cost models and create a taxonomy for Data Quality costs). Kim & Choi [3] discuss measuring Data Quality costs, and a rudimentary Data Quality cost calculation is described in [4]. Haug et al. [5] describe a classification of costs for poor Data Quality, and while they do not provide a cost calculation, they do define optimality for Data Quality. Laranjeiro et al. [6] have a recent survey of poor Data Quality classification. Ge & Helfert [7] extend the work in [2], and provide an updated review of Data Quality costs. Test Data is specifically addressed in the context of Data processing in [8]. Big Data Quality efforts are reviewed in [9], [10]. Data Quality metrics are discussed in [11], and requirements for Data Quality metrics are identified in [12]. Data inconsistencies are detailed in [13], while categorical Data inconsistencies are explained in [14]. In the current work, manufacturing test Data Quality is directly correlated to the speed of manufacturing test operations analysis. A Measurement for manufacturing test Data Quality indicates the speed at which analysis can be performed, and increases in the test Data Quality score have precipitated increases in the speed of analysis, described herein.

  • Measuring Manufacturing Test Data Analysis Quality
    2018 IEEE AUTOTESTCON, 2018
    Co-Authors: Andrew Burkhardt, Sheila Berryman, Ashley Brio, Susan Ferkau, Gloria Hubner, Susan Mittman, Kevin Lynch, Kathy Sonderer
    Abstract:

    Manufacturing test Data volumes are constantly increasing. While there has been extensive focus in the literature on big Data processing, less focus has existed on Data Quality, and considerably less focus has been placed specifically on manufacturing test Data Quality. This paper presents a fully automated test Data Quality Measurement developed by the authors to facilitate analysis of manufacturing test operations, resulting in a single number used to compare manufacturing test Data Quality across programs and factories, and focusing effort cost-effectively. The automation enables program and factory users to see, understand, and improve their test Data Quality directly. Immediate improvements in test Data Quality speed manufacturing test operation analysis, reducing elapsed time and overall spend in test operations. Data Quality has significant financial impacts to businesses [1]. While manufacturing cost models are well understood, Data Quality cost models are less well understood (see Eppler & Helfert [2] who review manufacturing cost models and create a taxonomy for Data Quality costs). Kim & Choi [3] discuss measuring Data Quality costs, and a rudimentary Data Quality cost calculation is described in [4]. Haug et al. [5] describe a classification of costs for poor Data Quality, and while they do not provide a cost calculation, they do define optimality for Data Quality. Laranjeiro et al. [6] have a recent survey of poor Data Quality classification. Ge & Helfert [7] extend the work in [2], and provide an updated review of Data Quality costs. Test Data is specifically addressed in the context of Data processing in [8]. Big Data Quality efforts are reviewed in [9], [10]. Data Quality metrics are discussed in [11], and requirements for Data Quality metrics are identified in [12]. Data inconsistencies are detailed in [13], while categorical Data inconsistencies are explained in [14]. In the current work, manufacturing test Data Quality is directly correlated to the speed of manufacturing test operations analysis. A Measurement for manufacturing test Data Quality indicates the speed at which analysis can be performed, and increases in the test Data Quality score have precipitated increases in the speed of analysis, described herein.

Andrew Burkhardt - One of the best experts on this subject based on the ideXlab platform.

  • Measuring Manufacturing Test Data Analysis Quality
    2018 IEEE AUTOTESTCON, 2018
    Co-Authors: Andrew Burkhardt, Sheila Berryman, Ashley Brio, Susan Ferkau, Gloria Hubner, Susan Mittman, Kevin Lynch, Kathy Sonderer
    Abstract:

    Manufacturing test Data volumes are constantly increasing. While there has been extensive focus in the literature on big Data processing, less focus has existed on Data Quality, and considerably less focus has been placed specifically on manufacturing test Data Quality. This paper presents a fully automated test Data Quality Measurement developed by the authors to facilitate analysis of manufacturing test operations, resulting in a single number used to compare manufacturing test Data Quality across programs and factories, and focusing effort cost-effectively. The automation enables program and factory users to see, understand, and improve their test Data Quality directly. Immediate improvements in test Data Quality speed manufacturing test operation analysis, reducing elapsed time and overall spend in test operations. Data Quality has significant financial impacts to businesses [1]. While manufacturing cost models are well understood, Data Quality cost models are less well understood (see Eppler & Helfert [2] who review manufacturing cost models and create a taxonomy for Data Quality costs). Kim & Choi [3] discuss measuring Data Quality costs, and a rudimentary Data Quality cost calculation is described in [4]. Haug et al. [5] describe a classification of costs for poor Data Quality, and while they do not provide a cost calculation, they do define optimality for Data Quality. Laranjeiro et al. [6] have a recent survey of poor Data Quality classification. Ge & Helfert [7] extend the work in [2], and provide an updated review of Data Quality costs. Test Data is specifically addressed in the context of Data processing in [8]. Big Data Quality efforts are reviewed in [9], [10]. Data Quality metrics are discussed in [11], and requirements for Data Quality metrics are identified in [12]. Data inconsistencies are detailed in [13], while categorical Data inconsistencies are explained in [14]. In the current work, manufacturing test Data Quality is directly correlated to the speed of manufacturing test operations analysis. A Measurement for manufacturing test Data Quality indicates the speed at which analysis can be performed, and increases in the test Data Quality score have precipitated increases in the speed of analysis, described herein.

  • Measuring Manufacturing Test Data Analysis Quality
    2018 IEEE AUTOTESTCON, 2018
    Co-Authors: Andrew Burkhardt, Sheila Berryman, Ashley Brio, Susan Ferkau, Gloria Hubner, Susan Mittman, Kevin Lynch, Kathy Sonderer
    Abstract:

    Manufacturing test Data volumes are constantly increasing. While there has been extensive focus in the literature on big Data processing, less focus has existed on Data Quality, and considerably less focus has been placed specifically on manufacturing test Data Quality. This paper presents a fully automated test Data Quality Measurement developed by the authors to facilitate analysis of manufacturing test operations, resulting in a single number used to compare manufacturing test Data Quality across programs and factories, and focusing effort cost-effectively. The automation enables program and factory users to see, understand, and improve their test Data Quality directly. Immediate improvements in test Data Quality speed manufacturing test operation analysis, reducing elapsed time and overall spend in test operations. Data Quality has significant financial impacts to businesses [1]. While manufacturing cost models are well understood, Data Quality cost models are less well understood (see Eppler & Helfert [2] who review manufacturing cost models and create a taxonomy for Data Quality costs). Kim & Choi [3] discuss measuring Data Quality costs, and a rudimentary Data Quality cost calculation is described in [4]. Haug et al. [5] describe a classification of costs for poor Data Quality, and while they do not provide a cost calculation, they do define optimality for Data Quality. Laranjeiro et al. [6] have a recent survey of poor Data Quality classification. Ge & Helfert [7] extend the work in [2], and provide an updated review of Data Quality costs. Test Data is specifically addressed in the context of Data processing in [8]. Big Data Quality efforts are reviewed in [9], [10]. Data Quality metrics are discussed in [11], and requirements for Data Quality metrics are identified in [12]. Data inconsistencies are detailed in [13], while categorical Data inconsistencies are explained in [14]. In the current work, manufacturing test Data Quality is directly correlated to the speed of manufacturing test operations analysis. A Measurement for manufacturing test Data Quality indicates the speed at which analysis can be performed, and increases in the test Data Quality score have precipitated increases in the speed of analysis, described herein.

London Jan - One of the best experts on this subject based on the ideXlab platform.

  • Open Data Quality Measurement framework: Definition and application to Open Government Data
    Government Information Quarterly, 2017
    Co-Authors: François Van Schalkwyk, Michelle Willmers, Maurice Mcnaughton, Jeffrey Thorsby, Genie N L Stowers, Kristen Wolslegel, Ellie Tumbuan, Anton Gerunov, Jeni This, London Jan
    Abstract:

    The diffusion of Open Government Data (OGD) in recent years kept a very fast pace. However, evidence from practitioners shows that disclosing Data without proper Quality control may jeopardize Dataset reuse and negatively affect civic participation. Current approaches to the problem in literature lack a comprehensive theoretical framework. Moreover, most of the evaluations concentrate on open Data platforms, rather than on Datasets.In this work, we address these two limitations and set up a framework of indicators to measure the Quality of Open Government Data on a series of Data Quality dimensions at most granular level of Measurement. We validated the evaluation framework by applying it to compare two cases of Italian OGD Datasets: an internationally recognized good example of OGD, with centralized disclosure and extensive Data Quality controls, and samples of OGD from decentralized Data disclosure (municipality level), with no possibility of extensive Quality controls as in the former case, hence with supposed lower Quality.Starting from Measurements based on the Quality framework, we were able to verify the difference in Quality: the measures showed a few common acquired good practices and weaknesses, and a set of discriminating factors that pertain to the type of Datasets and the overall approach. On the basis of this evaluation, we also provided technical and policy guidelines to overcome the weaknesses observed in the decentralized release policy, addressing specific Quality aspects.

Guy De Tre - One of the best experts on this subject based on the ideXlab platform.

  • operational Measurement of Data Quality
    International Conference Information Processing, 2018
    Co-Authors: Antoon Bronselaer, Joachim Nielandt, Toon Boeckling, Guy De Tre
    Abstract:

    In this paper, an alternative view on Measurement of Data Quality is proposed. Current procedures for Data Quality Measurement provide information about the extent to which Data misrepresent reality. These procedures are descriptive in the sense that they provide us numerical information about the state of Data. In many cases, this information is not sufficient to know whether Data is fit for the task it was meant for. To bridge that gap, we propose a procedure that measures the operational characteristics of Data. In this paper, we devise such a procedure by measuring the cost it takes to make Data fit for use. We lay out the basics of this procedure and then provide more details on two essential components: tasks and transformation functions.

  • IPMU (3) - Operational Measurement of Data Quality
    Communications in Computer and Information Science, 2018
    Co-Authors: Antoon Bronselaer, Joachim Nielandt, Toon Boeckling, Guy De Tre
    Abstract:

    In this paper, an alternative view on Measurement of Data Quality is proposed. Current procedures for Data Quality Measurement provide information about the extent to which Data misrepresent reality. These procedures are descriptive in the sense that they provide us numerical information about the state of Data. In many cases, this information is not sufficient to know whether Data is fit for the task it was meant for. To bridge that gap, we propose a procedure that measures the operational characteristics of Data. In this paper, we devise such a procedure by measuring the cost it takes to make Data fit for use. We lay out the basics of this procedure and then provide more details on two essential components: tasks and transformation functions.

  • a possibilistic treatment of Data Quality Measurement
    International Conference Information Processing, 2016
    Co-Authors: Antoon Bronselaer, Guy De Tre
    Abstract:

    The ever growing capabilities of Data storage systems have created the need to assess the Quality of Data in an efficient manner. In this paper, we consider a framework of Data Quality Measurement that relies on basic predicates formulated on the Data. It is then motivated that in some cases, the evaluation of predicates is hindered due to a lack of information. As a result, the truth value of a predicate can not be determined with complete certainty. In this paper, it is first shown how such uncertainty about the evaluation of predicates can be modelled. Such uncertainty can then be propagated throughout the Measurement process. This establishes a possibilistic Measurement of Data Quality.

  • IPMU (2) - A Possibilistic Treatment of Data Quality Measurement
    Information Processing and Management of Uncertainty in Knowledge-Based Systems, 2016
    Co-Authors: Antoon Bronselaer, Guy De Tre
    Abstract:

    The ever growing capabilities of Data storage systems have created the need to assess the Quality of Data in an efficient manner. In this paper, we consider a framework of Data Quality Measurement that relies on basic predicates formulated on the Data. It is then motivated that in some cases, the evaluation of predicates is hindered due to a lack of information. As a result, the truth value of a predicate can not be determined with complete certainty. In this paper, it is first shown how such uncertainty about the evaluation of predicates can be modelled. Such uncertainty can then be propagated throughout the Measurement process. This establishes a possibilistic Measurement of Data Quality.