Multiple Imputation

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 32760 Experts worldwide ranked by ideXlab platform

Joseph L. Schafer - One of the best experts on this subject based on the ideXlab platform.

  • Multiple Edit/Multiple Imputation for Multivariate Continuous Data
    Journal of the American Statistical Association, 2003
    Co-Authors: Bonnie Ghosh-dastidar, Joseph L. Schafer
    Abstract:

    Multiple Imputation replaces an incomplete dataset with m > 1 simulated complete versions that are analyzed separately by standard methods. We present a natural extension of Multiple Imputation for handling the dual problems of nonresponse and response error. This extension, which we call Multiple edit/Multiple Imputation (MEMI), replaces an observed dataset containing missing values and errors with m > 1 simulated versions of the ideal dataset that is complete and error-free. These ideal data sets are analyzed separately, and the results are combined using the same rules as for Multiple Imputation. The resulting inferences simultaneously reflect uncertainty due to nonresponse and response error. MEMI may be an attractive alternative to deterministic or quasi-statistical edit and Imputation procedures used by many data-collecting agencies. Producing MEMI's requires assumptions about the distribution of the ideal data, the nature of nonresponse, and a model for the response error mechanism. However, fittin...

  • Multiple Edit/Multiple Imputation for Multivariate Continuous Data
    Journal of the American Statistical Association, 2003
    Co-Authors: Bonnie Ghosh-dastidar, Joseph L. Schafer
    Abstract:

    Multiple Imputation replaces an incomplete dataset with m > 1 simulated complete versions that are analyzed separately by standard methods. We present a natural extension of Multiple Imputation for handling the dual problems of nonresponse and response error. This extension, which we call Multiple edit/Multiple Imputation (MEMI), replaces an observed dataset containing missing values and errors with m > 1 simulated versions of the ideal dataset that is complete and error-free. These ideal data sets are analyzed separately, and the results are combined using the same rules as for Multiple Imputation. The resulting inferences simultaneously reflect uncertainty due to nonresponse and response error. MEMI may be an attractive alternative to deterministic or quasi-statistical edit and Imputation procedures used by many data-collecting agencies. Producing MEMI's requires assumptions about the distribution of the ideal data, the nature of nonresponse, and a model for the response error mechanism. However, fitting such a model does not necessarily require data from a follow-up study. In this article we develop and implement MEMI for preliminary data from the Third National Health and Nutrition Examination Survey, Phase I (1988–1991). Raw body measurements for 1,345 children age 2–3 years are imputed under a Bayesian model for intermittent or semicontinuous errors. The resulting population estimates are found to be quite insensitive to prior assumptions about the rates and magnitude of errors.

  • Multiple Imputation a primer
    Statistical Methods in Medical Research, 1999
    Co-Authors: Joseph L. Schafer
    Abstract:

    In recent years, Multiple Imputation has emerged as a convenient and flexible paradigm for analysing data with missing values. Essential features of Multiple Imputation are reviewed, with answers to frequently asked questions about using the method in practice.

James R Carpenter - One of the best experts on this subject based on the ideXlab platform.

  • propensity score analysis with partially observed covariates how should Multiple Imputation be used
    Statistical Methods in Medical Research, 2019
    Co-Authors: Clemence Leyrat, James R Carpenter, Ian R. White, Shaun R Seaman, Ian J Douglas, Liam Smeeth, Matthieu Rescherigon, Elizabeth A Williamson
    Abstract:

    Inverse probability of treatment weighting is a popular propensity score-based approach to estimate marginal treatment effects in observational studies at risk of confounding bias. A major issue when estimating the propensity score is the presence of partially observed covariates. Multiple Imputation is a natural approach to handle missing data on covariates: covariates are imputed and a propensity score analysis is performed in each imputed dataset to estimate the treatment effect. The treatment effect estimates from each imputed dataset are then combined to obtain an overall estimate. We call this method MIte. However, an alternative approach has been proposed, in which the propensity scores are combined across the imputed datasets (MIps). Therefore, there are remaining uncertainties about how to implement Multiple Imputation for propensity score analysis: (a) should we apply Rubin's rules to the inverse probability of treatment weighting treatment effect estimates or to the propensity score estimates themselves? (b) does the outcome have to be included in the Imputation model? (c) how should we estimate the variance of the inverse probability of treatment weighting estimator after Multiple Imputation? We studied the consistency and balancing properties of the MIte and MIps estimators and performed a simulation study to empirically assess their performance for the analysis of a binary outcome. We also compared the performance of these methods to complete case analysis and the missingness pattern approach, which uses a different propensity score model for each pattern of missingness, and a third Multiple Imputation approach in which the propensity score parameters are combined rather than the propensity scores themselves (MIpar). Under a missing at random mechanism, complete case and missingness pattern analyses were biased in most cases for estimating the marginal treatment effect, whereas Multiple Imputation approaches were approximately unbiased as long as the outcome was included in the Imputation model. Only MIte was unbiased in all the studied scenarios and Rubin's rules provided good variance estimates for MIte. The propensity score estimated in the MIte approach showed good balancing properties. In conclusion, when using Multiple Imputation in the inverse probability of treatment weighting context, MIte with the outcome included in the Imputation model is the preferred approach.

  • Multiple Imputation for multilevel data with continuous and binary variables
    Statistical Science, 2018
    Co-Authors: Vincent Audigier, Stef Van Buuren, James R Carpenter, Ian R. White, Shahab Jolani, Thomas P. A. Debray, Matteo Quartagno, Matthieu Resche-rigon
    Abstract:

    We present and compare Multiple Imputation methods for multilevel continuous and binary data where variables are systematically and sporadically missing. The methods are compared from a theoretical point of view and through an extensive simulation study motivated by a real dataset comprising Multiple studies. The comparisons show that these Multiple Imputation methods are the most appropriate to handle missing values in a multilevel setting and why their relative performances can vary according to the missing data pattern, the multilevel structure and the type of missing variables. This study shows that valid inferences can only be obtained if the dataset includes a large number of clusters. In addition, it highlights that heteroscedastic Multiple Imputation methods provide more accurate inferences than homoscedastic methods, which should be reserved for data with few individuals per cluster. Finally, guidelines are given to choose the most suitable Multiple Imputation method according to the structure of the data.

  • Multiple Imputation of covariates by fully conditional specification accommodating the substantive model
    Statistical Methods in Medical Research, 2015
    Co-Authors: Jonathan W Bartlett, James R Carpenter, Ian R. White, Shaun R Seaman
    Abstract:

    Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using Multiple Imputation. Imputation of partially observed covariates is complicated if the substantive model is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of Multiple Imputation may impute covariates from models that are incompatible with such substantive models. We show how Imputation by fully conditional specification, a popular approach for performing Multiple Imputation, can be modified so that covariates are imputed from models which are compatible with the substantive model. We investigate through simulation the performance of this proposal, and compare it with existing approaches. Simulation results suggest our proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed Imputation models are correctly specified and mutually compatible. Stata software implementing the approach is freely available.

  • Multiple Imputation and its application
    2013
    Co-Authors: James R Carpenter, Michael G. Kenward
    Abstract:

    A practical guide to analysing partially observed data. Collecting, analysing and drawing inferences from data is central to research in the medical and social sciences. Unfortunately, it is rarely possible to collect all the intended data. The literature on inference from the resulting incomplete data is now huge, and continues to grow both as methods are developed for large and complex data structures, and as increasing computer power and suitable software enable researchers to apply these methods. This book focuses on a particular statistical method for analysing and drawing inferences from incomplete data, called Multiple Imputation (MI). MI is attractive because it is both practical and widely applicable. The authors aim is to clarify the issues raised by missing data, describing the rationale for MI, the relationship between the various Imputation models and associated algorithms and its application to increasingly complex data structures. Multiple Imputation and its Application: Discusses the issues raised by the analysis of partially observed data, and the assumptions on which analyses rest. Presents a practical guide to the issues to consider when analysing incomplete data from both observational studies and randomized trials. Provides a detailed discussion of the practical use of MI with real-world examples drawn from medical and social statistics. Explores handling non-linear relationships and interactions with Multiple Imputation, survival analysis, multilevel Multiple Imputation, sensitivity analysis via Multiple Imputation, using non-response weights with Multiple Imputation and doubly robust Multiple Imputation. Multiple Imputation and its Application is aimed at quantitative researchers and students in the medical and social sciences with the aim of clarifying the issues raised by the analysis of incomplete data data, outlining the rationale for MI and describing how to consider and address the issues that arise in its application.

  • strategies for Multiple Imputation in longitudinal studies
    American Journal of Epidemiology, 2010
    Co-Authors: Michael Spratt, John B Carlin, James R Carpenter, Jonathan A C Sterne, Jon Heron, John Henderson, Kate Tilling
    Abstract:

    Multiple Imputation is increasingly recommended in epidemiology to adjust for the bias and loss of information that may occur in analyses restricted to study participants with complete data ("complete-case analyses"). However, little guidance is available on applying the method, including which variables to include in the Imputation model and the number of Imputations needed. Here, the authors used Multiple Imputation to analyze the prevalence of wheeze among 81-month-old children in the Avon Longitudinal Study of Parents and Children (Avon, United Kingdom; 1991-1999) and the association of wheeze with gender, maternal asthma, and maternal smoking. The authors examined how inclusion of different types of variables in the Imputation model affected point estimates and precision, and assessed the impact of number of Imputations on Monte Carlo variability. Inclusion of variables associated with the outcome in the Imputation model increased odds ratios and reduced standard errors. When only 5 or 10 Imputations were used, variability due to the Imputation procedure was substantial enough to affect conclusions. Careful preliminary analysis identified the scope for Multiple Imputation to reduce bias and improve efficiency and provided guidance for building the Imputation model. When data are missing, such preliminary analyses should be routinely undertaken and reported, regardless of whether Multiple Imputation is used in the final analysis.

Ian R. White - One of the best experts on this subject based on the ideXlab platform.

  • propensity score analysis with partially observed covariates how should Multiple Imputation be used
    Statistical Methods in Medical Research, 2019
    Co-Authors: Clemence Leyrat, James R Carpenter, Ian R. White, Shaun R Seaman, Ian J Douglas, Liam Smeeth, Matthieu Rescherigon, Elizabeth A Williamson
    Abstract:

    Inverse probability of treatment weighting is a popular propensity score-based approach to estimate marginal treatment effects in observational studies at risk of confounding bias. A major issue when estimating the propensity score is the presence of partially observed covariates. Multiple Imputation is a natural approach to handle missing data on covariates: covariates are imputed and a propensity score analysis is performed in each imputed dataset to estimate the treatment effect. The treatment effect estimates from each imputed dataset are then combined to obtain an overall estimate. We call this method MIte. However, an alternative approach has been proposed, in which the propensity scores are combined across the imputed datasets (MIps). Therefore, there are remaining uncertainties about how to implement Multiple Imputation for propensity score analysis: (a) should we apply Rubin's rules to the inverse probability of treatment weighting treatment effect estimates or to the propensity score estimates themselves? (b) does the outcome have to be included in the Imputation model? (c) how should we estimate the variance of the inverse probability of treatment weighting estimator after Multiple Imputation? We studied the consistency and balancing properties of the MIte and MIps estimators and performed a simulation study to empirically assess their performance for the analysis of a binary outcome. We also compared the performance of these methods to complete case analysis and the missingness pattern approach, which uses a different propensity score model for each pattern of missingness, and a third Multiple Imputation approach in which the propensity score parameters are combined rather than the propensity scores themselves (MIpar). Under a missing at random mechanism, complete case and missingness pattern analyses were biased in most cases for estimating the marginal treatment effect, whereas Multiple Imputation approaches were approximately unbiased as long as the outcome was included in the Imputation model. Only MIte was unbiased in all the studied scenarios and Rubin's rules provided good variance estimates for MIte. The propensity score estimated in the MIte approach showed good balancing properties. In conclusion, when using Multiple Imputation in the inverse probability of treatment weighting context, MIte with the outcome included in the Imputation model is the preferred approach.

  • Multiple Imputation for multilevel data with continuous and binary variables
    Statistical Science, 2018
    Co-Authors: Vincent Audigier, Stef Van Buuren, James R Carpenter, Ian R. White, Shahab Jolani, Thomas P. A. Debray, Matteo Quartagno, Matthieu Resche-rigon
    Abstract:

    We present and compare Multiple Imputation methods for multilevel continuous and binary data where variables are systematically and sporadically missing. The methods are compared from a theoretical point of view and through an extensive simulation study motivated by a real dataset comprising Multiple studies. The comparisons show that these Multiple Imputation methods are the most appropriate to handle missing values in a multilevel setting and why their relative performances can vary according to the missing data pattern, the multilevel structure and the type of missing variables. This study shows that valid inferences can only be obtained if the dataset includes a large number of clusters. In addition, it highlights that heteroscedastic Multiple Imputation methods provide more accurate inferences than homoscedastic methods, which should be reserved for data with few individuals per cluster. Finally, guidelines are given to choose the most suitable Multiple Imputation method according to the structure of the data.

  • Outcome-sensitive Multiple Imputation: a simulation study
    BMC Medical Research Methodology, 2017
    Co-Authors: Evangelos Kontopantelis, Ian R. White, Matthew Sperrin, Iain Buchan
    Abstract:

    Abstract Background Multiple Imputation is frequently used to deal with missing data in healthcare research. Although it is known that the outcome should be included in the Imputation model when imputing missing covariate values, it is not known whether it should be imputed. Similarly no clear recommendations exist on: the utility of incorporating a secondary outcome, if available, in the Imputation model; the level of protection offered when data are missing not-at-random; the implications of the dataset size and missingness levels. Methods We used realistic assumptions to generate thousands of datasets across a broad spectrum of contexts: three mechanisms of missingness (completely at random; at random; not at random); varying extents of missingness (20–80% missing data); and different sample sizes (1,000 or 10,000 cases). For each context we quantified the performance of a complete case analysis and seven Multiple Imputation methods which deleted cases with missing outcome before Imputation, after Imputation or not at all; included or did not include the outcome in the Imputation models; and included or did not include a secondary outcome in the Imputation models. Methods were compared on mean absolute error, bias, coverage and power over 1,000 datasets for each scenario. Results Overall, there was very little to separate Multiple Imputation methods which included the outcome in the Imputation model. Even when missingness was quite extensive, all Multiple Imputation approaches performed well. Incorporating a secondary outcome, moderately correlated with the outcome of interest, made very little difference. The dataset size and the extent of missingness affected performance, as expected. Multiple Imputation methods protected less well against missingness not at random, but did offer some protection. Conclusions As long as the outcome is included in the Imputation model, there are very small performance differences between the possible Multiple Imputation approaches: no outcome Imputation, Imputation or Imputation and deletion. All informative covariates, even with very high levels of missingness, should be included in the Multiple Imputation model. Multiple Imputation offers some protection against a simple missing not at random mechanism.

  • Multiple Imputation of covariates by fully conditional specification accommodating the substantive model
    Statistical Methods in Medical Research, 2015
    Co-Authors: Jonathan W Bartlett, James R Carpenter, Ian R. White, Shaun R Seaman
    Abstract:

    Missing covariate data commonly occur in epidemiological and clinical research, and are often dealt with using Multiple Imputation. Imputation of partially observed covariates is complicated if the substantive model is non-linear (e.g. Cox proportional hazards model), or contains non-linear (e.g. squared) or interaction terms, and standard software implementations of Multiple Imputation may impute covariates from models that are incompatible with such substantive models. We show how Imputation by fully conditional specification, a popular approach for performing Multiple Imputation, can be modified so that covariates are imputed from models which are compatible with the substantive model. We investigate through simulation the performance of this proposal, and compare it with existing approaches. Simulation results suggest our proposal gives consistent estimates for a range of common substantive models, including models which contain non-linear covariate effects or interactions, provided data are missing at random and the assumed Imputation models are correctly specified and mutually compatible. Stata software implementing the approach is freely available.

  • Multiple Imputation by chained equations mice implementation in stata
    Journal of Statistical Software, 2011
    Co-Authors: Patrick Royston, Ian R. White
    Abstract:

    Missing data are a common occurrence in real datasets. For epidemiological and prognostic factors studies in medicine, Multiple Imputation is becoming the standard route to estimating models with missing covariate data under a missing-at-random assumption. We describe ice, an implementation in Stata of the MICE approach to Multiple Imputation. Real data from an observational study in ovarian cancer are used to illustrate the most important of the many options available with ice. We remark briefly on the new database architecture and procedures for Multiple Imputation introduced in releases 11 and 12 of Stata.

Bonnie Ghosh-dastidar - One of the best experts on this subject based on the ideXlab platform.

  • Multiple Edit/Multiple Imputation for Multivariate Continuous Data
    Journal of the American Statistical Association, 2003
    Co-Authors: Bonnie Ghosh-dastidar, Joseph L. Schafer
    Abstract:

    Multiple Imputation replaces an incomplete dataset with m > 1 simulated complete versions that are analyzed separately by standard methods. We present a natural extension of Multiple Imputation for handling the dual problems of nonresponse and response error. This extension, which we call Multiple edit/Multiple Imputation (MEMI), replaces an observed dataset containing missing values and errors with m > 1 simulated versions of the ideal dataset that is complete and error-free. These ideal data sets are analyzed separately, and the results are combined using the same rules as for Multiple Imputation. The resulting inferences simultaneously reflect uncertainty due to nonresponse and response error. MEMI may be an attractive alternative to deterministic or quasi-statistical edit and Imputation procedures used by many data-collecting agencies. Producing MEMI's requires assumptions about the distribution of the ideal data, the nature of nonresponse, and a model for the response error mechanism. However, fittin...

  • Multiple Edit/Multiple Imputation for Multivariate Continuous Data
    Journal of the American Statistical Association, 2003
    Co-Authors: Bonnie Ghosh-dastidar, Joseph L. Schafer
    Abstract:

    Multiple Imputation replaces an incomplete dataset with m > 1 simulated complete versions that are analyzed separately by standard methods. We present a natural extension of Multiple Imputation for handling the dual problems of nonresponse and response error. This extension, which we call Multiple edit/Multiple Imputation (MEMI), replaces an observed dataset containing missing values and errors with m > 1 simulated versions of the ideal dataset that is complete and error-free. These ideal data sets are analyzed separately, and the results are combined using the same rules as for Multiple Imputation. The resulting inferences simultaneously reflect uncertainty due to nonresponse and response error. MEMI may be an attractive alternative to deterministic or quasi-statistical edit and Imputation procedures used by many data-collecting agencies. Producing MEMI's requires assumptions about the distribution of the ideal data, the nature of nonresponse, and a model for the response error mechanism. However, fitting such a model does not necessarily require data from a follow-up study. In this article we develop and implement MEMI for preliminary data from the Third National Health and Nutrition Examination Survey, Phase I (1988–1991). Raw body measurements for 1,345 children age 2–3 years are imputed under a Bayesian model for intermittent or semicontinuous errors. The resulting population estimates are found to be quite insensitive to prior assumptions about the rates and magnitude of errors.

Donald B Rubin - One of the best experts on this subject based on the ideXlab platform.

  • discussion on Multiple Imputation
    International Statistical Review, 2007
    Co-Authors: Donald B Rubin
    Abstract:

    As the "father" of Multiple Imputation (MI), it gives me great pleasure to be able to comment on this collection of contributions on MI. The nice review by Paul Zhang serves as an excellent introduction to the more critical attention lavished on MI by Soren Nielsen and the extensive discussion by Xiao-Li Meng and Martin Romero. I have a few comments on this package, which are designed to clarify a few points and supplement other points from my "applied statistician's" perspective. My focus in the following is more on Nielsen's article because the expressed views are less consistent with my own than the contributions of the other authors. Nevertheless, despite differences of emphasis, I want to express my sincere gratitude to Nielsen for bringing his technical adroitness to address the issue of Multiple Imputation, in particular, and the problem of missing data in general (e.g., Nielsen, 1997, 2000).

  • Multiple Imputation after 18 years
    Journal of the American Statistical Association, 1996
    Co-Authors: Donald B Rubin
    Abstract:

    Abstract Multiple Imputation was designed to handle the problem of missing data in public-use data bases where the data-base constructor and the ultimate user are distinct entities. The objective is valid frequency inference for ultimate users who in general have access only to complete-data software and possess limited knowledge of specific reasons and models for nonresponse. For this situation and objective, I believe that Multiple Imputation by the data-base constructor is the method of choice. This article first provides a description of the assumed context and objectives, and second, reviews the Multiple Imputation framework and its standard results. These preliminary discussions are especially important because some recent commentaries on Multiple Imputation have reflected either misunderstandings of the practical objectives of Multiple Imputation or misunderstandings of fundamental theoretical results. Then, criticisms of Multiple Imputation are considered, and, finally, comparisons are made to alt...

  • Multiple Imputation after 18+ Years
    Journal of the American Statistical Association, 1996
    Co-Authors: Donald B Rubin
    Abstract:

    Abstract Multiple Imputation was designed to handle the problem of missing data in public-use data bases where the data-base constructor and the ultimate user are distinct entities. The objective is valid frequency inference for ultimate users who in general have access only to complete-data software and possess limited knowledge of specific reasons and models for nonresponse. For this situation and objective, I believe that Multiple Imputation by the data-base constructor is the method of choice. This article first provides a description of the assumed context and objectives, and second, reviews the Multiple Imputation framework and its standard results. These preliminary discussions are especially important because some recent commentaries on Multiple Imputation have reflected either misunderstandings of the practical objectives of Multiple Imputation or misunderstandings of fundamental theoretical results. Then, criticisms of Multiple Imputation are considered, and, finally, comparisons are made to alt...

  • Multiple Imputation in health care databases an overview and some applications
    Statistics in Medicine, 1991
    Co-Authors: Donald B Rubin, Nathaniel Schenker
    Abstract:

    Multiple Imputation for non-response replaces each missing value by two or more plausible values. The values can be chosen to represent both uncertainty about the reasons for non-response and uncertainty about which values to impute assuming the reasons for non-response are known. This paper provides an overview of methods for creating and analysing multiply-imputed data sets, and illustrates the dramatic improvements possible when using Multiple rather than single Imputation. A major application of Multiple Imputation to public-use files from the 1970 census is discussed, and several exploratory studies related to health care that have used Multiple Imputation are described.