Data Mechanism

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 949479 Experts worldwide ranked by ideXlab platform

Ian R. White - One of the best experts on this subject based on the ideXlab platform.

  • rctmiss stata module to analyse a randomised controlled trial rct allowing for informatively missing outcome Data
    Statistical Software Components, 2017
    Co-Authors: Ian R. White
    Abstract:

    rctmiss analyses a randomised controlled trial with missing outcome Data under a range of assumptions about the missing Data. The Data and missingness are modelled jointly using either a pattern-mixture model (modelling the differences between missing and observed Data) or a selection model (modelling the missing Data Mechanism). Assumptions about the missing Data are expressed via a sensitivity parameter delta which measures the degree of departure from missing at random. Results can be obtained for a single assumption (a single value of delta, possibly varying between individuals), or graphed over a range of assumptions (a range of values of delta).

  • a guide to handling missing Data in cost effectiveness analysis conducted within randomised controlled trials
    PharmacoEconomics, 2014
    Co-Authors: Rita Faria, Manuel Gomes, David Epstein, Ian R. White
    Abstract:

    Missing Data are a frequent problem in cost-effectiveness analysis (CEA) within a randomised controlled trial. Inappropriate methods to handle missing Data can lead to misleading results and ultimately can affect the decision of whether an intervention is good value for money. This article provides practical guidance on how to handle missing Data in within-trial CEAs following a principled approach: (i) the analysis should be based on a plausible assumption for the missing Data Mechanism, i.e. whether the probability that Data are missing is independent of or dependent on the observed and/or unobserved values; (ii) the method chosen for the base-case should fit with the assumed Mechanism; and (iii) sensitivity analysis should be conducted to explore to what extent the results change with the assumption made. This approach is implemented in three stages, which are described in detail: (1) descriptive analysis to inform the assumption on the missing Data Mechanism; (2) how to choose between alternative methods given their underlying assumptions; and (3) methods for sensitivity analysis. The case study illustrates how to apply this approach in practice, including software code. The article concludes with recommendations for practice and suggestions for future research.

  • bias and efficiency of multiple imputation compared with complete case analysis for missing covariate values
    Statistics in Medicine, 2010
    Co-Authors: Ian R. White, John B Carlin
    Abstract:

    When missing Data occur in one or more covariates in a regression model, multiple imputation (MI) is widely advocated as an improvement over complete-case analysis (CC). We use theoretical arguments and simulation studies to compare these methods with MI implemented under a missing at random assumption. When Data are missing completely at random, both methods have negligible bias, and MI is more efficient than CC across a wide range of scenarios. For other missing Data Mechanisms, bias arises in one or both methods. In our simulation setting, CC is biased towards the null when Data are missing at random. However, when missingness is independent of the outcome given the covariates, CC has negligible bias and MI is biased away from the null. With more general missing Data Mechanisms, bias tends to be smaller for MI than for CC. Since MI is not always better than CC for missing covariate problems, the choice of method should take into account what is known about the missing Data Mechanism in a particular substantive application. Importantly, the choice of method should not be based on comparison of standard errors. We propose new ways to understand empirical differences between MI and CC, which may provide insights into the appropriateness of the assumptions underlying each method, and we propose a new index for assessing the likely gain in precision from MI: the fraction of incomplete cases among the observed values of a covariate (FICO).

  • bias and efficiency of multiple imputation compared with complete case analysis for missing covariate values
    Statistics in Medicine, 2010
    Co-Authors: Ian R. White, John B Carlin
    Abstract:

    When missing Data occur in one or more covariates in a regression model, multiple imputation (MI) is widely advocated as an improvement over complete-case analysis (CC). We use theoretical arguments and simulation studies to compare these methods with MI implemented under a missing at random assumption. When Data are missing completely at random, both methods have negligible bias, and MI is more efficient than CC across a wide range of scenarios. For other missing Data Mechanisms, bias arises in one or both methods. In our simulation setting, CC is biased towards the null when Data are missing at random. However, when missingness is independent of the outcome given the covariates, CC has negligible bias and MI is biased away from the null. With more general missing Data Mechanisms, bias tends to be smaller for MI than for CC. Since MI is not always better than CC for missing covariate problems, the choice of method should take into account what is known about the missing Data Mechanism in a particular substantive application. Importantly, the choice of method should not be based on comparison of standard errors. We propose new ways to understand empirical differences between MI and CC, which may provide insights into the appropriateness of the assumptions underlying each method, and we propose a new index for assessing the likely gain in precision from MI: the fraction of incomplete cases among the observed values of a covariate (FICO). Copyright © 2010 John Wiley & Sons, Ltd.

Terry E Duncan - One of the best experts on this subject based on the ideXlab platform.

  • a comparison of model and multiple imputation based approaches to longitudinal analyses with partial missingness
    Structural Equation Modeling, 1998
    Co-Authors: Terry E Duncan, Susan C Duncan
    Abstract:

    Longitudinal Data sets typically suffer from attrition and other forms of missing Data. Omissions, attrition, and planned missingness have limited our ability to conduct the most appropriate analyses. When this common problem occurs, several researchers have demonstrated that correct estimation with missing Data can be obtained under mild assumptions concerning the missing Data Mechanism. An example application of latent growth curve methodology, analyzing the longitudinal developmental change in adolescent alcohol consumption, is presented (N = 586; 250 young men and 336 young women, 14 to 16 years of age at the initial time point). The analyses are conducted within a cohort‐sequential design, incorporating missingness introduced by design and due to attrition. We describe and illustrate 3 approaches to the analysis of missing Data when some Data are missing: multiple‐sample structural equation modeling procedures, raw maximum likelihood analyses, and multiple modeling and Data augmentation algorithms.

  • modeling incomplete longitudinal substance use Data using latent variable growth curve methodology
    Multivariate Behavioral Research, 1994
    Co-Authors: Susan C Duncan, Terry E Duncan
    Abstract:

    Longitudinal Data sets typically suffer from attrition and other forms of missing Data. When this common problem occurs, several researchers have demonstrated that correct maximum likelihood estimation with missing Data can be obtained under mild assumptions concerning the missing Data Mechanism. With reasonable substantive theory, a mixture of cross-sectional and longitudinal methods developed within multiple-group structural equation modeling can provide a strong basis for inference about developmental change. Using an approach to the analysis of missing Data, the present study investigated developmental trends in adolescent (N = 759) alcohol, marijuana, and cigarette use across a 5-year period using multiple-group latent growth modeling. An associative model revealed that common developmental trends existed for all three substances. Age and gender were included in the model as predictors of initial status and developmental change. Findings discuss the utility of latent variable structural equation mode...

  • modeling incomplete longitudinal substance use Data using latent variable growth curve methodology
    Multivariate Behavioral Research, 1994
    Co-Authors: Susan C Duncan, Terry E Duncan
    Abstract:

    Longitudinal Data sets typically suffer from attrition and other forms of missing Data. When this common problem occurs, several researchers have demonstrated that correct maximum likelihood estimation with missing Data can be obtained under mild assumptions concerning the missing Data Mechanism. With reasonable substantive theory, a mixture of cross-sectional and longitudinal methods developed within multiple-group structural equation modeling can provide a strong basis for inference about developmental change. Using an approach to the analysis of missing Data, the present study investigated developmental trends in adolescent (N = 759) alcohol, marijuana, and cigarette use across a 5-year period using multiple-group latent growth modeling. An associative model revealed that common developmental trends existed for all three substances. Age and gender were included in the model as predictors of initial status and developmental change. Findings discuss the utility of latent variable structural equation modeling techniques and missing Data approaches in the study of developmental change.

Louise Ryan - One of the best experts on this subject based on the ideXlab platform.

  • on the impact of nonresponse in logistic regression application to the 45 and up study
    BMC Medical Research Methodology, 2017
    Co-Authors: Joanna J J Wang, Mark Bartlett, Louise Ryan
    Abstract:

    In longitudinal studies, nonresponse to follow-up surveys poses a major threat to validity, interpretability and generalisation of results. The problem of nonresponse is further complicated by the possibility that nonresponse may depend on the outcome of interest. We identified sociodemographic, general health and wellbeing characteristics associated with nonresponse to the follow-up questionnaire and assessed the extent and effect of nonresponse on statistical inference in a large-scale population cohort study. We obtained the Data from the baseline and first wave of the follow-up survey of the 45 and Up Study. Of those who were invited to participate in the follow-up survey, 65.2% responded. Logistic regression model was used to identify baseline characteristics associated with follow-up response. A Bayesian selection model approach with sensitivity analysis was implemented to model nonignorable nonresponse. Characteristics associated with a higher likelihood of responding to the follow-up survey include female gender, age categories 55–74, high educational qualification, married/de facto, worked part or partially or fully retired and higher household income. Parameter estimates and conclusions are generally consistent across different assumptions on the missing Data Mechanism. However, we observed some sensitivity for variables that are strong predictors for both the outcome and nonresponse. Results indicated in the context of the binary outcome under study, nonresponse did not result in substantial bias and did not alter the interpretation of results in general. Conclusions were still largely robust under nonignorable missing Data Mechanism. Use of a Bayesian selection model is recommended as a useful strategy for assessing potential sensitivity of results to missing Data.

Susan C Duncan - One of the best experts on this subject based on the ideXlab platform.

  • a comparison of model and multiple imputation based approaches to longitudinal analyses with partial missingness
    Structural Equation Modeling, 1998
    Co-Authors: Terry E Duncan, Susan C Duncan
    Abstract:

    Longitudinal Data sets typically suffer from attrition and other forms of missing Data. Omissions, attrition, and planned missingness have limited our ability to conduct the most appropriate analyses. When this common problem occurs, several researchers have demonstrated that correct estimation with missing Data can be obtained under mild assumptions concerning the missing Data Mechanism. An example application of latent growth curve methodology, analyzing the longitudinal developmental change in adolescent alcohol consumption, is presented (N = 586; 250 young men and 336 young women, 14 to 16 years of age at the initial time point). The analyses are conducted within a cohort‐sequential design, incorporating missingness introduced by design and due to attrition. We describe and illustrate 3 approaches to the analysis of missing Data when some Data are missing: multiple‐sample structural equation modeling procedures, raw maximum likelihood analyses, and multiple modeling and Data augmentation algorithms.

  • modeling incomplete longitudinal substance use Data using latent variable growth curve methodology
    Multivariate Behavioral Research, 1994
    Co-Authors: Susan C Duncan, Terry E Duncan
    Abstract:

    Longitudinal Data sets typically suffer from attrition and other forms of missing Data. When this common problem occurs, several researchers have demonstrated that correct maximum likelihood estimation with missing Data can be obtained under mild assumptions concerning the missing Data Mechanism. With reasonable substantive theory, a mixture of cross-sectional and longitudinal methods developed within multiple-group structural equation modeling can provide a strong basis for inference about developmental change. Using an approach to the analysis of missing Data, the present study investigated developmental trends in adolescent (N = 759) alcohol, marijuana, and cigarette use across a 5-year period using multiple-group latent growth modeling. An associative model revealed that common developmental trends existed for all three substances. Age and gender were included in the model as predictors of initial status and developmental change. Findings discuss the utility of latent variable structural equation mode...

  • modeling incomplete longitudinal substance use Data using latent variable growth curve methodology
    Multivariate Behavioral Research, 1994
    Co-Authors: Susan C Duncan, Terry E Duncan
    Abstract:

    Longitudinal Data sets typically suffer from attrition and other forms of missing Data. When this common problem occurs, several researchers have demonstrated that correct maximum likelihood estimation with missing Data can be obtained under mild assumptions concerning the missing Data Mechanism. With reasonable substantive theory, a mixture of cross-sectional and longitudinal methods developed within multiple-group structural equation modeling can provide a strong basis for inference about developmental change. Using an approach to the analysis of missing Data, the present study investigated developmental trends in adolescent (N = 759) alcohol, marijuana, and cigarette use across a 5-year period using multiple-group latent growth modeling. An associative model revealed that common developmental trends existed for all three substances. Age and gender were included in the model as predictors of initial status and developmental change. Findings discuss the utility of latent variable structural equation modeling techniques and missing Data approaches in the study of developmental change.

Jun Shao - One of the best experts on this subject based on the ideXlab platform.

  • approximate conditional likelihood for generalized linear models with general missing Data Mechanism
    Journal of Systems Science & Complexity, 2017
    Co-Authors: Jiwei Zhao, Jun Shao
    Abstract:

    The generalized linear model is an indispensable tool for analyzing non-Gaussian response Data, with both canonical and non-canonical link functions comprehensively used. When missing values are present, many existing methods in the literature heavily depend on an unverifiable assumption of the missing Data Mechanism, and they fail when the assumption is violated. This paper proposes a missing Data Mechanism that is as generally applicable as possible, which includes both ignorable and nonignorable missing Data cases, as well as both scenarios of missing values in response and covariate. Under this general missing Data Mechanism, the authors adopt an approximate conditional likelihood method to estimate unknown parameters. The authors rigorously establish the regularity conditions under which the unknown parameters are identifiable under the approximate conditional likelihood approach. For parameters that are identifiable, the authors prove the asymptotic normality of the estimators obtained by maximizing the approximate conditional likelihood. Some simulation studies are conducted to evaluate finite sample performance of the proposed estimators as well as estimators from some existing methods. Finally, the authors present a biomarker analysis in prostate cancer study to illustrate the proposed method.

  • semiparametric pseudo likelihoods in generalized linear models with nonignorable missing Data
    Journal of the American Statistical Association, 2015
    Co-Authors: Jiwei Zhao, Jun Shao
    Abstract:

    We consider identifiability and estimation in a generalized linear model in which the response variable and some covariates have missing values and the missing Data Mechanism is nonignorable and unspecified. We adopt a pseudo-likelihood approach that makes use of an instrumental variable to help identifying unknown parameters in the presence of nonignorable missing Data. Explicit conditions for the identifiability of parameters are given. Some asymptotic properties of the parameter estimators based on maximizing the pseudo-likelihood are established. Explicit asymptotic covariance matrix and its estimator are also derived in some cases. For the numerical maximization of the pseudo-likelihood, we develop a two-step iteration algorithm that decomposes a nonconcave maximization problem into two problems of maximizing concave functions. Some simulation results and an application to a Dataset from cotton factory workers are also presented.