Data Quality Tool

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 57 Experts worldwide ranked by ideXlab platform

Vojtech Huser - One of the best experts on this subject based on the ideXlab platform.

  • MedInfo - Extending Achilles Heel Data Quality Tool with New Rules Informed by Multi-Site Data Quality Comparison.
    Studies in health technology and informatics, 2019
    Co-Authors: Vojtech Huser, Juan M. Banda, Hanieh Razzaghi, Ajit Londhe, Zuoyi Zhang, Xiaochun Li, Sungjae Jung, Rae Woong Park, Karthik Natarajan
    Abstract:

    : Large healthcare Datasets of Electronic Health Record Data became indispensable in clinical research. Data Quality in such Datasets recently became a focus of many distributed research networks. Despite the fact that Data Quality is specific to a given research question, many existing Data Quality platform prove that general Data Quality assessment on Dataset level (given a spectrum of research questions) is possible and highly requested by researchers. We present comparison of 12 Datasets and extension of Achilles Heel Data Quality software Tool with new rules and Data characterization measures.

  • extending achilles heel Data Quality Tool with new rules informed by multi site Data Quality comparison
    World Congress on Medical and Health Informatics Medinfo, 2019
    Co-Authors: Vojtech Huser, Juan M. Banda, Hanieh Razzaghi, Ajit Londhe, Zuoyi Zhang, Sungjae Jung, Rae Woong Park, Karthik Natarajan
    Abstract:

    Large healthcare Datasets of Electronic Health Record Data became indispensable in clinical research. Data Quality in such Datasets recently became a focus of many distributed research networks. Despite the fact that Data Quality is specific to a given research question, many existing Data Quality platform prove that general Data Quality assessment on Dataset level (given a spectrum of research questions) is possible and highly requested by researchers. We present comparison of 12 Datasets and extension of Achilles Heel Data Quality software Tool with new rules and Data characterization measures.

  • PSB - Methods for examining Data Quality in healthcare integrated Data repositories.
    Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 2017
    Co-Authors: Vojtech Huser, Michael G. Kahn, Jeffrey S. Brown, Ramkiran Gouripeddi
    Abstract:

    This paper summarizes content of the workshop focused on Data Quality. The first speaker (VH) described Data Quality infrastructure and Data Quality evaluation methods currently in place within the Observational Data Science and Informatics (OHDSI) consortium. The speaker described in detail a Data Quality Tool called Achilles Heel and latest development for extending this Tool. Interim results of an ongoing Data Quality study within the OHDSI consortium were also presented. The second speaker (MK) described lessons learned and new Data Quality checks developed by the PEDsNet pediatric research network. The last two speakers (JB, RG) described Tools developed by the Sentinel Initiative and University of Utah's service oriented framework. The workshop discussed at the end and throughout how Data Quality assessment can be advanced by combining best features of each network.

  • Multisite Evaluation of a Data Quality Tool for Patient-Level Clinical Data Sets.
    EGEMS (Washington DC), 2016
    Co-Authors: Vojtech Huser, Rae Woong Park, Frank J. Defalco, Martijn J. Schuemie, Patrick B. Ryan, Ning Shang, Mark Velez, Richard D. Boyce, Jon D. Duke, Ritu Khare
    Abstract:

    Introduction: Data Quality and fitness for analysis are crucial if outputs of analyses of electronic health record Data or administrative claims Data should be trusted by the public and the research community. Methods: We describe a Data Quality analysis Tool (called Achilles Heel) developed by the Observational Health Data Sciences and Informatics Collaborative (OHDSI) and compare outputs from this Tool as it was applied to 24 large healthcare Datasets across seven different organizations. Results: We highlight 12 Data Quality rules that identified issues in at least 10 of the 24 Datasets and provide a full set of 71 rules identified in at least one Dataset. Achilles Heel is a freely available software that provides a useful starter set of Data Quality rules with the ability to add additional rules. We also present results of a structured email-based interview of all participating sites that collected qualitative comments about the value of Achilles Heel for Data Quality evaluation. Discussion: Our analysis represents the first comparison of outputs from a Data Quality Tool that implements a fixed (but extensible) set of Data Quality rules. Thanks to a common Data model, we were able to compare quickly multiple Datasets originating from several countries in America, Europe and Asia.

Karthik Natarajan - One of the best experts on this subject based on the ideXlab platform.

  • MedInfo - Extending Achilles Heel Data Quality Tool with New Rules Informed by Multi-Site Data Quality Comparison.
    Studies in health technology and informatics, 2019
    Co-Authors: Vojtech Huser, Juan M. Banda, Hanieh Razzaghi, Ajit Londhe, Zuoyi Zhang, Xiaochun Li, Sungjae Jung, Rae Woong Park, Karthik Natarajan
    Abstract:

    : Large healthcare Datasets of Electronic Health Record Data became indispensable in clinical research. Data Quality in such Datasets recently became a focus of many distributed research networks. Despite the fact that Data Quality is specific to a given research question, many existing Data Quality platform prove that general Data Quality assessment on Dataset level (given a spectrum of research questions) is possible and highly requested by researchers. We present comparison of 12 Datasets and extension of Achilles Heel Data Quality software Tool with new rules and Data characterization measures.

  • extending achilles heel Data Quality Tool with new rules informed by multi site Data Quality comparison
    World Congress on Medical and Health Informatics Medinfo, 2019
    Co-Authors: Vojtech Huser, Juan M. Banda, Hanieh Razzaghi, Ajit Londhe, Zuoyi Zhang, Sungjae Jung, Rae Woong Park, Karthik Natarajan
    Abstract:

    Large healthcare Datasets of Electronic Health Record Data became indispensable in clinical research. Data Quality in such Datasets recently became a focus of many distributed research networks. Despite the fact that Data Quality is specific to a given research question, many existing Data Quality platform prove that general Data Quality assessment on Dataset level (given a spectrum of research questions) is possible and highly requested by researchers. We present comparison of 12 Datasets and extension of Achilles Heel Data Quality software Tool with new rules and Data characterization measures.

Robert A. Verheij - One of the best experts on this subject based on the ideXlab platform.

  • TRANSFoRm Data Quality Tool
    Journal of Clinical Bioinformatics, 2015
    Co-Authors: Robert A. Verheij
    Abstract:

    Tool description As computerisation of primary care facilities is rapidly increasing, a wealth of information is created in routinely recorded electronic health records (EHR) that can be used and is already used for research purposes. However, we need to be able to assess whether the Data these primary care Databases contain is ‘fit for purpose’. In the TRANSFoRm project an axiomatic approach was developed that started with defining purpose and population, followed by the definition of a set of metrics for Data Quality described in terms of completeness, accuracy, correctness and consistency. Based on this approach a web-based prototype of a Data Quality Tool [1] was developed and evaluated in two TRANSFoRm clinical use cases (GORD and Diabetes), allowing researchers to select primary care sites with Data that is fit for purpose . For example, a researcher might only be interested in practices with more or less complete recordings of HbA1c, blood glucose and smoking status (Figure 1). Status of development Prototype.

Rae Woong Park - One of the best experts on this subject based on the ideXlab platform.

  • MedInfo - Extending Achilles Heel Data Quality Tool with New Rules Informed by Multi-Site Data Quality Comparison.
    Studies in health technology and informatics, 2019
    Co-Authors: Vojtech Huser, Juan M. Banda, Hanieh Razzaghi, Ajit Londhe, Zuoyi Zhang, Xiaochun Li, Sungjae Jung, Rae Woong Park, Karthik Natarajan
    Abstract:

    : Large healthcare Datasets of Electronic Health Record Data became indispensable in clinical research. Data Quality in such Datasets recently became a focus of many distributed research networks. Despite the fact that Data Quality is specific to a given research question, many existing Data Quality platform prove that general Data Quality assessment on Dataset level (given a spectrum of research questions) is possible and highly requested by researchers. We present comparison of 12 Datasets and extension of Achilles Heel Data Quality software Tool with new rules and Data characterization measures.

  • extending achilles heel Data Quality Tool with new rules informed by multi site Data Quality comparison
    World Congress on Medical and Health Informatics Medinfo, 2019
    Co-Authors: Vojtech Huser, Juan M. Banda, Hanieh Razzaghi, Ajit Londhe, Zuoyi Zhang, Sungjae Jung, Rae Woong Park, Karthik Natarajan
    Abstract:

    Large healthcare Datasets of Electronic Health Record Data became indispensable in clinical research. Data Quality in such Datasets recently became a focus of many distributed research networks. Despite the fact that Data Quality is specific to a given research question, many existing Data Quality platform prove that general Data Quality assessment on Dataset level (given a spectrum of research questions) is possible and highly requested by researchers. We present comparison of 12 Datasets and extension of Achilles Heel Data Quality software Tool with new rules and Data characterization measures.

  • Multisite Evaluation of a Data Quality Tool for Patient-Level Clinical Data Sets.
    EGEMS (Washington DC), 2016
    Co-Authors: Vojtech Huser, Rae Woong Park, Frank J. Defalco, Martijn J. Schuemie, Patrick B. Ryan, Ning Shang, Mark Velez, Richard D. Boyce, Jon D. Duke, Ritu Khare
    Abstract:

    Introduction: Data Quality and fitness for analysis are crucial if outputs of analyses of electronic health record Data or administrative claims Data should be trusted by the public and the research community. Methods: We describe a Data Quality analysis Tool (called Achilles Heel) developed by the Observational Health Data Sciences and Informatics Collaborative (OHDSI) and compare outputs from this Tool as it was applied to 24 large healthcare Datasets across seven different organizations. Results: We highlight 12 Data Quality rules that identified issues in at least 10 of the 24 Datasets and provide a full set of 71 rules identified in at least one Dataset. Achilles Heel is a freely available software that provides a useful starter set of Data Quality rules with the ability to add additional rules. We also present results of a structured email-based interview of all participating sites that collected qualitative comments about the value of Achilles Heel for Data Quality evaluation. Discussion: Our analysis represents the first comparison of outputs from a Data Quality Tool that implements a fixed (but extensible) set of Data Quality rules. Thanks to a common Data model, we were able to compare quickly multiple Datasets originating from several countries in America, Europe and Asia.

Cláudia Da Silva - One of the best experts on this subject based on the ideXlab platform.

  • The Online Pollen Catalogs Network (RCPol) Data Quality assurance system
    Biodiversity Information Science and Standards, 2018
    Co-Authors: Allan Veiga, Antonio Mauro Saraiva, Cláudia Da Silva
    Abstract:

    The Online Pollen Catalogs Network (RCPol) (http://rcpol.org.br) was conceived to promote interaction among researchers and the integration of Data from pollen collections, herbaria and bee collections. In order to structure RCPol work, researchers and collaborators have organized information on Palynology in four branches: palynoecology, paleopalynology, palynotaxonomy and spores. This information is collaboratively digitized and managed using standardized Google Spreadsheets. These Datasets are assessed by the RCPol palynology experts and when a Dataset is compliant with the RCPol Data Quality policy, it is published to http://chaves.rcpol.org.br. Data Quality assessment used to be performed manually by the experts and was time-consuming and inconsistent in detecting Data Quality problemas such as incomplete and inconsistent information. In order to support Data Quality assessment in a more automated and effective way, we are developing a Data Quality Tool which implements a series of mechanisms to measure, validate and improve completeness, consistency, conformity, accessibility and uniqueness of Data, prior to a manual expert assessment. The system was designed according to the conceptual framework proposed by Task Group 1 of the Biodiversity Data Quality Interest Group Veiga et al. 2017. For each sheet in the Google Spreadsheet, the system generates a set of assertions of measures, validations and amendments for the records (rows) and Datasets (sheets), according to a profile defined for RCPol. The profile follows the policies of Data Quality measurement, validation and enhancement. The Data Quality measurement policy encompassess the dimensions of completeness, consistency, conformity, accessibility and uniqueness. RCPol uses a Quality assurance approach: only Data that are compliant with all the Quality requirements are published in the system. Therefore, its Data Quality validation policy only considers Datasets with 100% completeness, consistency, conformity, accessibility and uniqueness. In order to improve the Quality in each relevant dimension, a set of enhancements was defined in the Data Quality enhancement policy. Based on this RCPol profile, the system is able to generate reports that contain measures, validations and amendments assertions with the method and Tool used to generate the assertion. This web-based system can be tested at http://chaves.rcpol.org.br/admin/Data-Quality with the Dataset https://docs.google.com/spreadsheets/u/1/d/1gH0aa2qqnAgfAixGom3Gnx6Qp 91ZvWhUHPb_QeoIreQ. This system is able to assure that only Data compliant with the Data Quality profile defined by RCPol are fit for use and can be published. This system contributes significantly to decreasing the workload of the experts. Some Data may still contain values that cannot be easily automatically assessed, e.g. validate if the content of an image matches the respective scientific name, so expert manual assessment remains necessary. After the system reports that Data are compliant with the profile, a manual assessment must be performed by the experts, using the Data Quality report as support, and only after that will the Data be published. The next steps include archival of the Data Quality reports in a Database, improving the web interface to enable searching and sorting of assertions, and to provide a machine readable interface for the Data Quality reports.

  • The Online Pollen Catalogs Network (RCPol)
    Biodiversity Information Science and Standards, 2018
    Co-Authors: Allan Veiga, Antonio Mauro Saraiva, Cláudia Da Silva
    Abstract:

    Aiming at promoting interaction among researchers and the integration of Data from their pollen collections, herbaria and bee collections, RCPol was created in 2013. In order to structure RCPol work, researchers and collaborators have organized information on Palynology and trophic interactions between bees and plants. During the project development, different computing Tools were developed and provided on RCPol website (http://rcpol.org.br), including: interactive keys with multiple inputs for species identification (http://chaves.rcpol.org.br); a glossary of palinology related terms (http://chaves.rcpol.org.br/profile/glossary/eco); a plant-bee interactions Database (http://chaves.rcpol.org.br/interactions); and a Data Quality Tool (http://chaves.rcpol.org.br/admin/Data-Quality). Those Tools were developed in partnership with researchers and collaborators from Escola Politécnica (USP) and other Brazilian and foreign institutions that act on palynology, floral biology, pollination, plant taxonomy, ecology, and trophic interactions. The interactive keys are organized in four branches: palynoecology, paleopalynology, palynotaxonomy and spores. These information are collaboratively digitized and managed using standardized Google Spreadsheets. All the information are assessed by a Data Quality assurance Tool (based on the conceptual framework of TDWG Biodiversity Data Quality Interest Group Veiga et al. 2017) and curated by palynology experts. In total, it has published 1,774 specimens records, 1,488 species records (automatically generated by merging specimens records with the same scientific name), 656 interactions records, 370 glossary terms records and 15 institutions records, all of them translated from the original language (usually Portuguese or English) to Portuguese, English and Spanish. During the projectʼs first three years, 106 partners, among researchers and collaborators from 28 institutions from Brazil and abroad, actively participated on the project. An important part of the project's activities involved training researchers and students on palynology, Data digitization and on the use of the system. Until now six training courses have reached 192 people.