Misclassification

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 37119 Experts worldwide ranked by ideXlab platform

George Maldonado - One of the best experts on this subject based on the ideXlab platform.

  • o4e 4 application of probabilistic bias analysis to adjust for exposure Misclassification in a cohort of trichlorophenol workers
    Occupational and Environmental Medicine, 2019
    Co-Authors: Laura Scott, George Maldonado
    Abstract:

    This method was developed to demonstrate the application of probabilistic bias analysis to quantify and adjust for exposure Misclassification in a historical cohort mortality study of New Zealand trichlorophenol workers where exposure to 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) was measured as a multi-level variable. Published exposure information available for this cohort of workers was used to specify the initial bias parameter distributions, which were then varied under 18 different scenarios to assess the potential impact of differing amounts of Misclassification as well as both non-differential and differential exposure Misclassification. For each scenario, each bias parameter distribution was sampled 50 000 times using Monte Carlo simulation techniques to generate adjusted counts of cases and non-cases of ischemic heart disease (IHD) by exposure group. These counts were then used to calculate odds ratios adjusted for exposure Misclassification and the associated exposure Misclassification error terms. Given the specified assumptions, the geometric mean (GM) adjusted odds ratio had a range of 2.89 to 5.05, and the GM error term ranged from 0.60 to 1.05. In all non-differential scenarios and scenarios in which non-cases had greater proportions of Misclassification, the observed odds ratio of 3.05 was closer to the null (i.e., 1) than the GM adjusted odds ratio. For the differential simulations where cases had higher proportions of Misclassification, the direction of the error was dependent on the level of Misclassification error, with smaller proportions of Misclassification resulting in the observed odds ratio being farther away from the null than the GM adjusted odds ratio. These findings demonstrate that probabilistic bias analysis of historical cohort mortality studies can be an effective tool for understanding trends in study error stemming from exposure Misclassification and confirm the importance of quantifying potential sources of systematic error.

  • o8c 1 a probabilistic bias analysis method for evaluating disease Misclassification in a historical cohort mortality study of trichlorophenol workers
    Occupational and Environmental Medicine, 2019
    Co-Authors: Laura Scott, George Maldonado
    Abstract:

    Occupational epidemiologists have long considered death certificate inaccuracies a critical issue when conducting historical cohort mortality (HCM) studies. However, the vast majority of these types of studies do not include a quantitative assessment of the impact of disease Misclassification on study results. We developed a probabilistic bias-analysis method to evaluate the effect of disease Misclassification using ischemic heart disease (IHD) mortality data from a cohort study of New Zealand trichlorophenol workers exposed to 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD). The sensitivity and specificity of IHD death certificate diagnoses in high income countries, as described in 14 peer-reviewed validation studies, were used to construct bias parameter distributions for nine simulation scenarios. We defined the parameter distributions for a non-differential disease Misclassification scenario first, using a beta distribution for the sensitivity parameters and a maximum extreme distribution for the specificity parameters. To evaluate the potential effects of differential Misclassification, we also varied the distribution peaks for the highest and lowest exposure categories. As before, a beta distribution was used for all the sensitivity parameters. However, both maximum and minimum extreme distributions were used for specificity parameters in these scenarios. For each scenario, the specified sensitivity and specificity distributions were sampled using Monte Carlo techniques. The inverse matrix of the sampled classification proportions was then multiplied by the vector of observed cell counts to generate a vector of cell counts adjusted for disease Misclassification, which were used to calculate an adjusted odds ratio. When Misclassification was differential, the geometric mean adjusted odds ratio ranged from 1.9 to 4.9 with study error resulting in bias both away from and toward the null. Under the assumption of non-differential Misclassification, the geometric mean adjusted odds ratio was slightly smaller than the unadjusted estimate. Probabilistic bias analysis can be a useful tool for evaluating study error in historical cohort mortality studies.

  • how far from non differential does exposure or disease Misclassification have to be to bias measures of association away from the null
    International Journal of Epidemiology, 2008
    Co-Authors: Anne M Jurek, Sander Greenland, George Maldonado
    Abstract:

    A well-known heuristic in epidemiology is that non-differential exposure or disease Misclassification biases the expected values of an estimator toward the null value. This heuristic works correctly only when additional conditions are met, such as independence of classification errors. We present examples to show that, even when the additional conditions are met, if the Misclassification is only approximately non-differential, then bias is not guaranteed to be toward the null. In light of such examples, we advise that evaluation of Misclassification should not be based on the assumption of exact non-differentiality unless the latter can be deduced logically from the facts of the situation.

Sander Greenland - One of the best experts on this subject based on the ideXlab platform.

  • how far from non differential does exposure or disease Misclassification have to be to bias measures of association away from the null
    International Journal of Epidemiology, 2008
    Co-Authors: Anne M Jurek, Sander Greenland, George Maldonado
    Abstract:

    A well-known heuristic in epidemiology is that non-differential exposure or disease Misclassification biases the expected values of an estimator toward the null value. This heuristic works correctly only when additional conditions are met, such as independence of classification errors. We present examples to show that, even when the additional conditions are met, if the Misclassification is only approximately non-differential, then bias is not guaranteed to be toward the null. In light of such examples, we advise that evaluation of Misclassification should not be based on the assumption of exact non-differentiality unless the latter can be deduced logically from the facts of the situation.

  • sensitivity analysis of Misclassification a graphical and a bayesian approach
    Annals of Epidemiology, 2006
    Co-Authors: Haitao Chu, Stephen R Cole, Zhaojie Wang, Sander Greenland
    Abstract:

    Purpose Misclassification can produce bias in measures of association. Sensitivity analyses have been suggested to explore the impact of such bias, but do not supply formally justified interval estimates. Methods To account for exposure Misclassification, recently developed Bayesian approaches were extended to incorporate prior uncertainty and correlation of sensitivity and specificity. Under nondifferential Misclassification, a contour plot is used to depict relations among the corrected odds ratio, sensitivity, and specificity. Results Methods are illustrated by application to a case–control study of cigarette smoking and invasive pneumococcal disease while varying the distributional assumptions about sensitivity and specificity. Results are compared with those of conventional methods, which do not account for Misclassification, and a sensitivity analysis, which assumes fixed sensitivity and specificity. Conclusion By using Bayesian methods, investigators can incorporate uncertainty about Misclassification into probabilistic inferences.

  • a method to automate probabilistic sensitivity analyses of misclassified binary variables
    International Journal of Epidemiology, 2005
    Co-Authors: Matthew P Fox, Timothy L Lash, Sander Greenland
    Abstract:

    Background Misclassification bias is present in most studies, yet uncertainty about its magnitude or direction is rarely quantified. Methods The authors present a method for probabilistic sensitivity analysis to quantify likely effects of Misclassification of a dichotomous outcome, exposure or covariate. This method involves reconstructing the data that would have been observed had the misclassified variable been correctly classified, given the sensitivity and specificity of classification. The accompanying SAS macro implements the method and allows users to specify ranges of sensitivity and specificity of Misclassification parameters to yield simulation intervals that incorporate both systematic and random error. Results The authors illustrate the method and the accompanying SAS macro code by applying it to a study of the relation between occupational resin exposure and lung-cancer deaths. The authors compare the results using this method with the conventional result, which accounts for random error only, and with the original sensitivity analysis results. Conclusion By accounting for plausible degrees of Misclassification, investigators can present study results in a way that incorporates uncertainty about the bias due to Misclassification, and so avoid misleadingly precise-looking results.

Taghi M. Khoshgoftaar - One of the best experts on this subject based on the ideXlab platform.

  • a multi objective software quality classification model using genetic programming
    IEEE Transactions on Reliability, 2007
    Co-Authors: Taghi M. Khoshgoftaar
    Abstract:

    A key factor in the success of a software project is achieving the best-possible software reliability within the allotted time & budget. Classification models which provide a risk-based software quality prediction, such as fault-prone & not fault-prone, are effective in providing a focused software quality assurance endeavor. However, their usefulness largely depends on whether all the predicted fault-prone modules can be inspected or improved by the allocated software quality-improvement resources, and on the project-specific costs of Misclassifications. Therefore, a practical goal of calibrating classification models is to lower the expected cost of Misclassification while providing a cost-effective use of the available software quality-improvement resources. This paper presents a genetic programming-based decision tree model which facilitates a multi-objective optimization in the context of the software quality classification problem. The first objective is to minimize the "Modified Expected Cost of Misclassification", which is our recently proposed goal-oriented measure for selecting & evaluating classification models. The second objective is to optimize the number of predicted fault-prone modules such that it is equal to the number of modules which can be inspected by the allocated resources. Some commonly used classification techniques, such as logistic regression, decision trees, and analogy-based reasoning, are not suited for directly optimizing multi-objective criteria. In contrast, genetic programming is particularly suited for the multi-objective optimization problem. An empirical case study of a real-world industrial software system demonstrates the promising results, and the usefulness of the proposed model

  • resource oriented selection of rule based classification models an empirical case study
    Software Quality Journal, 2006
    Co-Authors: Taghi M. Khoshgoftaar, Angela Herzberg, Naeem Seliya
    Abstract:

    The amount of resources allocated for software quality improvements is often not enough to achieve the desired software quality. Software quality classification models that yield a risk-based quality estimation of program modules, such as fault-prone (fp) and not fault-prone (nfp), are useful as software quality assurance techniques. Their usefulness is largely dependent on whether enough resources are available for inspecting the fp modules. Since a given development project has its own budget and time limitations, a resource-based software quality improvement seems more appropriate for achieving its quality goals. A classification model should provide quality improvement guidance so as to maximize resource-utilization. We present a procedure for building software quality classification models from the limited resources perspective. The essence of the procedure is the use of our recently proposed Modified Expected Cost of Misclassification (MECM) measure for developing resource-oriented software quality classification models. The measure penalizes a model, in terms of costs of Misclassifications, if the model predicts more number of fp modules than the number that can be inspected with the allotted resources. Our analysis is presented in the context of our Rule-Based Classification Modeling (RBCM) technique. An empirical case study of a large-scale software system demonstrates the promising results of using the MECM measure to select an appropriate resource-based rule-based classification model.

  • comparative assessment of software quality classification techniques an empirical case study
    Empirical Software Engineering, 2004
    Co-Authors: Taghi M. Khoshgoftaar, Naeem Seliya
    Abstract:

    Software metrics-based quality classification models predict a software module as either fault-prone (fp) or not fault-prone (nfp). Timely application of such models can assist in directing quality improvement efforts to modules that are likely to be fp during operations, thereby cost-effectively utilizing the software quality testing and enhancement resources. Since several classification techniques are available, a relative comparative study of some commonly used classification techniques can be useful to practitioners. We present a comprehensive evaluation of the relative performances of seven classification techniques and/or tools. These include logistic regression, case-based reasoning, classification and regression trees (CART), tree-based classification with S-PLUS, and the Sprint-Sliq, C4.5, and Treedisc algorithms. The use of expected cost of Misclassification (ECM), is introduced as a singular unified measure to compare the performances of different software quality classification models. A function of the costs of the Type I (a nfp module misclassified as fp) and Type II (a fp module misclassified as nfp) Misclassifications, ECM is computed for different cost ratios. Evaluating software quality classification models in the presence of varying cost ratios is important, because the usefulness of a model is dependent on the system-specific costs of Misclassifications. Moreover, models should be compared and preferred for cost ratios that fall within the range of interest for the given system and project domain. Software metrics were collected from four successive releases of a large legacy telecommunications system. A two-way ANOVA randomized-complete block design modeling approach is used, in which the system release is treated as a block, while the modeling method is treated as a factor. It is observed that predictive performances of the models is significantly different across the system releases, implying that in the software engineering domain prediction models are influenced by the characteristics of the data and the system being modeled. Multiple-pairwise comparisons are performed to evaluate the relative performances of the seven models for the cost ratios of interest to the case study. In addition, the performance of the seven classification techniques is also compared with a classification based on lines of code. The comparative approach presented in this paper can also be applied to other software systems.

Teshia G. Arambula Solomon - One of the best experts on this subject based on the ideXlab platform.

  • Blood Politics, Ethnic Identity, and Racial Misclassification among American Indians and Alaska Natives
    Journal of environmental and public health, 2014
    Co-Authors: Emily A. Haozous, Carolyn June Strickland, Janelle F. Palacios, Teshia G. Arambula Solomon
    Abstract:

    Misclassification of race in medical and mortality records has long been documented as an issue in American Indian/Alaska Native data. Yet, little has been shared in a cohesive narrative which outlines why Misclassification of American Indian/Alaska Native identity occurs. The purpose of this paper is to provide a summary of the current state of the science in racial Misclassification among American Indians and Alaska Natives. We also provide a historical context on the importance of this problem and describe the ongoing political processes that both affect racial Misclassification and contribute to the context of American Indian and Alaska Native identity.

James D Stamey - One of the best experts on this subject based on the ideXlab platform.

  • Bayesian sample size determination for binary regression with a misclassified covariate and no gold standard
    Computational Statistics & Data Analysis, 2012
    Co-Authors: Daniel P. Beavers, James D Stamey
    Abstract:

    Covariate Misclassification is a common problem in epidemiology, genetics, and other biomedical areas. Because this form of Misclassification is known to bias estimators, accounting for it at the design stage is of high importance. In this paper, we extend on previous work applied to response Misclassification by developing a Bayesian approach to sample size determination for a covariate Misclassification model with no gold standard. Our procedure considers both conditionally independent tests and tests in which dependence exists between classifiers. We specifically consider a Bayesian power criterion for the sample size determination scheme, and we demonstrate the improvement in model power for our dual classifier approach compared to a naive single classifier approach.