Data Quality Assessment

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 278328 Experts worldwide ranked by ideXlab platform

Michael G. Kahn - One of the best experts on this subject based on the ideXlab platform.

  • A Comparison of Data Quality Assessment Checks in Six Data Sharing Networks.
    EGEMS (Washington DC), 2017
    Co-Authors: Tiffany J. Callahan, Alan Bauck, Jenny Staab, Meredith Nahm Zozus, David Bertoch, Ritu Khare, Jeffrey R. Brown, Patrick B Ryan, Michael G. Kahn
    Abstract:

    Objective: To compare rule-based Data Quality (DQ) Assessment approaches across multiple national clinical Data sharing organizations. Methods: Six organizations with established Data Quality Assessment (DQA) programs provided documentation or source code describing current DQ checks. DQ checks were mapped to the categories within the Data verification context of the harmonized DQA terminology. To ensure all DQ checks were consistently mapped, conventions were developed and four iterations of mapping performed. Difficult-to-map DQ checks were discussed with research team members until consensus was achieved. Results: Participating organizations provided 11,026 DQ checks, of which 99.97 percent were successfully mapped to a DQA category. Of the mapped DQ checks (N=11,023), 214 (1.94 percent) mapped to multiple DQA categories. The majority of DQ checks mapped to Atemporal Plausibility (49.60 percent), Value Conformance (17.84 percent), and Atemporal Completeness (12.98 percent) categories. Discussion: Using the common DQA terminology, near-complete (99.97 percent) coverage across a wide range of DQA programs and specifications was reached. Comparing the distributions of mapped DQ checks revealed important differences between participating organizations. This variation may be related to the organization’s stakeholder requirements, primary analytical focus, or maturity of their DQA program. Not within scope, mapping checks within the Data validation context of the terminology may provide additional insights into DQA practice differences. Conclusion: A common DQA terminology provides a means to help organizations and researchers understand the coverage of their current DQA efforts as well as highlight potential areas for additional DQA development. Sharing DQ checks between organizations could help expand the scope of DQA across clinical Data networks.

  • a harmonized Data Quality Assessment terminology and framework for the secondary use of electronic health record Data
    eGEMs (Generating Evidence & Methods to improve patient outcomes), 2016
    Co-Authors: Michael G. Kahn, Alan Bauck, Tiffany J. Callahan, Juliana Barnard, Jeffrey S Brown, Bruce N Davidson, Hossein Estiri, Carsten Goerg, Erin Holve, Steven G Johnson
    Abstract:

    Objective: Harmonized Data Quality (DQ) Assessment terms, methods, and reporting practices can establish a common understanding of the strengths and limitations of electronic health record (EHR) Data for operational analytics, Quality improvement, and research. Existing published DQ terms were harmonized to a comprehensive unified terminology with definitions and examples and organized into a conceptual framework to support a common approach to defining whether EHR Data is ‘fit’ for specific uses. Materials and Methods: DQ publications, informatics and analytics experts, managers of established DQ programs, and operational manuals from several mature EHR-based research networks were reviewed to identify potential DQ terms and categories. Two face-to-face stakeholder meetings were used to vet an initial set of DQ terms and definitions that were grouped into an overall conceptual framework. Feedback received from Data producers and users was used to construct a draft set of harmonized DQ terms and categories. Multiple rounds of iterative refinement resulted in a set of terms and organizing framework consisting of DQ categories, subcategories, terms, definitions, and examples. The harmonized terminology and logical framework’s inclusiveness was evaluated against ten published DQ terminologies. Results: Existing DQ terms were harmonized and organized into a framework by defining three DQ categories: (1) Conformance (2) Completeness and (3) Plausibility and two DQ Assessment contexts: (1) Verification and (2) Validation. Conformance and Plausibility categories were further divided into subcategories. Each category and subcategory was defined with respect to whether the Data may be verified with organizational Data, or validated against an accepted gold standard, depending on proposed context and uses. The coverage of the harmonized DQ terminology was validated by successfully aligning to multiple published DQ terminologies. Discussion: Existing DQ concepts, community input, and expert review informed the development of a distinct set of terms, organized into categories and subcategories. The resulting DQ terms successfully encompassed a wide range of disparate DQ terminologies. Operational definitions were developed to provide guidance for implementing DQ Assessment procedures. The resulting structure is an inclusive DQ framework for standardizing DQ Assessment and reporting. While our analysis focused on the DQ issues often found in EHR Data, the new terminology may be applicable to a wide range of electronic health Data such as administrative, research, and patient-reported Data. Conclusion: A consistent, common DQ terminology, organized into a logical framework, is an initial step in enabling Data owners and users, patients, and policy makers to evaluate and communicate Data Quality findings in a well-defined manner with a shared vocabulary. Future work will leverage the framework and terminology to develop reusable Data Quality Assessment and reporting methods.

  • Data Quality Assessment for comparative effectiveness research in distributed Data networks
    Medical Care, 2013
    Co-Authors: Jeffrey S Brown, Michael G. Kahn
    Abstract:

    Background:Electronic health information routinely collected during health care delivery and reimbursement can help address the need for evidence about the real-world effectiveness, safety, and Quality of medical care. Often, distributed networks that combine information from multiple sources are ne

  • A Pragmatic Framework for Single-site and Multisite Data Quality Assessment in Electronic Health Record-based Clinical Research
    Medical Care, 2012
    Co-Authors: Michael G. Kahn, Jason M. Glanz, Karen Riedlinger, Marsha A. Raebel, John F Steiner
    Abstract:

    Introduction: Answers to clinical and public health research questions increasingly require aggregated Data from multiple sites. Data from electronic health records and other clinical sources are useful for such studies, but require stringent Quality Assessment. Data Quality Assessment is particularly important in multisite studies to distinguish true variations in care from Data Quality problems. Methods: We propose a “fit-for-use” conceptual model for Data Quality Assessment and a process model for planning and conducting single-site and multisite Data Quality Assessments. These approaches are illustrated using examples from prior multisite studies. Approach: Critical components of multisite Data Quality Assessment include: thoughtful prioritization of variables and Data Quality dimensions for Assessment; development and use of standardized approaches to Data Quality Assessment that can improve Data utility over time; iterative cycles of Assessment within and between sites; targeting Assessment toward Data domains known to be vulnerable to Quality problems; and detailed documentation of the rationale and outcomes of Data Quality Assessments to inform Data users. The Assessment process requires constant communication between sitelevel Data providers, Data coordinating centers, and principal investigators. Discussion: A conceptually based and systematically executed approach to Data Quality Assessment is essential to achieve the potential of the electronic revolution in health care. High-Quality Data allow “learning health care organizations” to analyze and act on

Huanfeng Shen - One of the best experts on this subject based on the ideXlab platform.

Jens Lehmann - One of the best experts on this subject based on the ideXlab platform.

  • nlp Data cleansing based on linguistic ontology constraints
    European Semantic Web Conference, 2014
    Co-Authors: Dimitris Kontokostas, Jens Lehmann, Martin Brummer, Sebastian Hellmann, Lazaros Ioannidis
    Abstract:

    Linked Data comprises of an unprecedented volume of structured Data on the Web and is adopted from an increasing number of domains. However, the varying Quality of published Data forms a barrier for further adoption, especially for Linked Data consumers. In this paper, we extend a previously developed methodology of Linked Data Quality Assessment, which is inspired by test-driven software development. Specifically, we enrich it with ontological support and different levels of result reporting and describe how the method is applied in the Natural Language Processing (NLP) area. NLP is – compared to other domains, such as biology – a late Linked Data adopter. However, it has seen a steep rise of activity in the creation of Data and ontologies. NLP Data Quality Assessment has become an important need for NLP Datasets. In our study, we analysed 11 Datasets using the lemon and NIF vocabularies in 277 test cases and point out common Quality issues.

  • crowdsourcing linked Data Quality Assessment
    International Semantic Web Conference, 2013
    Co-Authors: Maribel Acosta, Sören Auer, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Jens Lehmann
    Abstract:

    In this paper we look into the use of crowdsourcing as a means to handle Linked Data Quality problems that are challenging to be solved automatically. We analyzed the most common errors encountered in Linked Data sources and classified them according to the extent to which they are likely to be amenable to a specific form of crowdsourcing. Based on this analysis, we implemented a Quality Assessment methodology for Linked Data that leverages the wisdom of the crowds in different ways: (i) a contest targeting an expert crowd of researchers and Linked Data enthusiasts; complemented by (ii) paid microtasks published on Amazon Mechanical Turk.We empirically evaluated how this methodology could efficiently spot Quality issues in DBpedia. We also investigated how the contributions of the two types of crowds could be optimally integrated into Linked Data curation processes. The results show that the two styles of crowdsourcing are complementary and that crowdsourcing-enabled Quality Assessment is a promising and affordable way to enhance the Quality of Linked Data.

Carlo Batini - One of the best experts on this subject based on the ideXlab platform.

  • remote sensing Data Quality model from Data sources to lifecycle phases
    International Journal of Image and Data Fusion, 2019
    Co-Authors: Arpad Barsi, Carlo Batini, Zsofia Kugler, Attila Juhasz, Gyorgy Szabo, Hussein Abdulmuttalib, Guoman Huang, Huanfeng Shen
    Abstract:

    The importance of Data Quality Assessment has significantly increased with the boom of information technology and the growing demand for remote sensing (RS) Data. The Remote Sensing Data Quality Wo...

  • Methodologies for Data Quality Assessment and improvement
    ACM Computing Surveys, 2009
    Co-Authors: Carlo Batini, Cinzia Cappiello, Chiara Francalanci, Andrea Maurino
    Abstract:

    The literature provides a wide range of techniques to assess and improve the Quality of Data. Due to the diversity and complexity of these techniques, research has recently focused on defining methodologies that help the selection, customization, and application of Data Quality Assessment and improvement techniques. The goal of this article is to provide a systematic and comparative description of such methodologies. Methodologies are compared along several dimensions, including the methodological phases and steps, the strategies and techniques, the Data Quality dimensions, the types of Data, and, finally, the types of information systems addressed by each methodology. The article concludes with a summary description of each methodology.

Chunhua Weng - One of the best experts on this subject based on the ideXlab platform.

  • A Data Quality Assessment Guideline for Electronic Health Record Data Reuse.
    EGEMS (Washington DC), 2017
    Co-Authors: Nicole G. Weiskopf, George Hripcsak, Suzanne Bakken, Chunhua Weng
    Abstract:

    Introduction: We describe the formulation, development, and initial expert review of 3x3 Data Quality Assessment (DQA), a dynamic, evidence-based guideline to enable electronic health record (EHR) Data Quality Assessment and reporting for clinical research. Methods: 3x3 DQA was developed through the triangulation results from three studies: a review of the literature on EHR Data Quality Assessment, a quantitative study of EHR Data completeness, and a set of interviews with clinical researchers. Following initial development, the guideline was reviewed by a panel of EHR Data Quality experts. Results: The guideline embraces the task-dependent nature of Data Quality and Data Quality Assessment. The core framework includes three constructs of Data Quality: complete, correct, and current Data. These constructs are operationalized according to the three primary dimensions of EHR Data: patients, variables, and time. Each of the nine operationalized constructs maps to a methodological recommendation for EHR Data Quality Assessment. The initial expert response to the framework was positive, but improvements are required. Discussion: The initial version of 3x3 DQA promises to enable explicit guideline-based best practices for EHR Data Quality Assessment and reporting. Future work will focus on increasing clarity on how and when 3x3 DQA should be used during the research process, improving the feasibility and ease-of-use of recommendation execution, and clarifying the process for users to determine which operationalized constructs and recommendations are relevant for a given Dataset and study.

  • Methods and dimensions of electronic health record Data Quality Assessment: Enabling reuse for clinical research
    Journal of the American Medical Informatics Association, 2013
    Co-Authors: Nicole G. Weiskopf, Chunhua Weng
    Abstract:

    OBJECTIVE To review the methods and dimensions of Data Quality Assessment in the context of electronic health record (EHR) Data reuse for research. MATERIALS AND METHODS A review of the clinical research literature discussing Data Quality Assessment methodology for EHR Data was performed. Using an iterative process, the aspects of Data Quality being measured were abstracted and categorized, as well as the methods of Assessment used. RESULTS Five dimensions of Data Quality were identified, which are completeness, correctness, concordance, plausibility, and currency, and seven broad categories of Data Quality Assessment methods: comparison with gold standards, Data element agreement, Data source agreement, distribution comparison, validity checks, log review, and element presence. DISCUSSION Examination of the methods by which clinical researchers have investigated the Quality and suitability of EHR Data for research shows that there are fundamental features of Data Quality, which may be difficult to measure, as well as proxy dimensions. Researchers interested in the reuse of EHR Data for clinical research are recommended to consider the adoption of a consistent taxonomy of EHR Data Quality, to remain aware of the task-dependence of Data Quality, to integrate work on Data Quality Assessment from other fields, and to adopt systematic, empirically driven, statistically based methods of Data Quality Assessment. CONCLUSION There is currently little consistency or potential generalizability in the methods used to assess EHR Data Quality. If the reuse of EHR Data for clinical research is to become accepted, researchers should adopt validated, systematic methods of EHR Data Quality Assessment.