Handling Missing Data

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 8385 Experts worldwide ranked by ideXlab platform

Hua Liang Wei - One of the best experts on this subject based on the ideXlab platform.

  • Handling Missing Data in multivariate time series using a vector autoregressive model-imputation (VAR-IM) algorithm
    Neurocomputing, 2017
    Co-Authors: Faraj Bashir, Hua Liang Wei
    Abstract:

    Imputing Missing Data from a multivariate time series Dataset remains a challenging problem. There is an abundance of research on using various techniques to impute Missing, biased, or corrupted values to a Dataset. While a great amount of work has been done in this field, most imputing methodologies are centered about a specific application, typically involving static Data analysis and simple time series modelling. However, these approaches fall short of desired goals when the Data originates from a multivariate time series. The objective of this paper is to introduce a new algorithm for Handling Missing Data from multivariate time series Datasets. This new approach is based on a vector autoregressive (VAR) model by combining an expectation and minimization (EM) algorithm with the prediction error minimization (PEM) method. The new algorithm is called a vector autoregressive imputation method (VAR-IM). A description of the algorithm is presented and a case study was accomplished using the VAR-IM. The case study was applied to a real-world Data set involving electrocardiogram (ECG) Data. The VAR-IM method was compared with both traditional methods list wise deletion and linear regression substitution; and modern methods Multivariate Auto-Regressive State-Space (MARSS) and expectation maximization algorithm (EM). Generally, the VAR-IM method achieved significant improvement of the imputation tasks as compared with the other two methods. Although an improvement, a summary of the limitations and restrictions when using VAR-IM is presented.

  • Handling Missing Data in multivariate time series using a vector autoregressive model based imputation var im algorithm part i var im algorithm versus traditional methods
    Mediterranean Conference on Control and Automation, 2016
    Co-Authors: Faraj Bashir, Hua Liang Wei
    Abstract:

    Given an observed Data set, there are different methods that can be used to impute Missing Data. While excellent work has been done in this field, most available approaches are focused on some particular applications, such as static Data and univariate time series. The primary aim of the two papers Part I VAR-IM algorithm v.s. traditional methods and Part II VAR-IM algorithm v.s. modern algorithms — is to introduce an algorithm for Handling Missing Data in multivariate time series based on vector autoregressive (VAR) model by combining an expectation and minimization (EM) algorithm with the prediction error minimization (PEM) method. In the first part, we conduct two cases studies (one for simulation Data and another for real ECG Data) to compare the proposed algorithm with three traditional methods for imputing Missing Data: Mean substitution, list-wise deletion and linear regression substitution. In the second part, the proposed method is compared with more powerful modern techniques: MARRS Package, nearest neighbour, and the full information maximum likelihood (FIML) method. Furthermore, we demonstrate the use of the proposed method together with an empirical example of multivariate time series to ECG Data and discuss its advantages and limitations.

Ian R. White - One of the best experts on this subject based on the ideXlab platform.

  • should multiple imputation be the method of choice for Handling Missing Data in randomized trials
    Statistical Methods in Medical Research, 2018
    Co-Authors: Thomas Sullivan, Ian R. White, Amy Salter, Philip Ryan, Katherine J Lee
    Abstract:

    The use of multiple imputation has increased markedly in recent years, and journal reviewers may expect to see multiple imputation used to handle Missing Data. However in randomized trials, where treatment group is always observed and independent of baseline covariates, other approaches may be preferable. Using Data simulation we evaluated multiple imputation, performed both overall and separately by randomized group, across a range of commonly encountered scenarios. We considered both Missing outcome and Missing baseline Data, with Missing outcome Data induced under Missing at random mechanisms. Provided the analysis model was correctly specified, multiple imputation produced unbiased treatment effect estimates, but alternative unbiased approaches were often more efficient. When the analysis model overlooked an interaction effect involving randomized group, multiple imputation produced biased estimates of the average treatment effect when applied to Missing outcome Data, unless imputation was performed separately by randomized group. Based on these results, we conclude that multiple imputation should not be seen as the only acceptable way to handle Missing Data in randomized trials. In settings where multiple imputation is adopted, we recommend that imputation is carried out separately by randomized group.

  • should multiple imputation be the method of choice for Handling Missing Data in randomized trials
    Statistical Methods in Medical Research, 2018
    Co-Authors: Thomas Sullivan, Ian R. White, Amy Salter, Philip Ryan
    Abstract:

    T Sullivan was supported by an Australian Postgraduate Award. I White was funded by the Medical Research Council (Unit Programme number U105260558). K Lee was supported by a National Health and Medical Research Council Career Development Fellowship (1053609).

  • a guide to Handling Missing Data in cost effectiveness analysis conducted within randomised controlled trials
    PharmacoEconomics, 2014
    Co-Authors: Rita Faria, Manuel Gomes, David Epstein, Ian R. White
    Abstract:

    Missing Data are a frequent problem in cost-effectiveness analysis (CEA) within a randomised controlled trial. Inappropriate methods to handle Missing Data can lead to misleading results and ultimately can affect the decision of whether an intervention is good value for money. This article provides practical guidance on how to handle Missing Data in within-trial CEAs following a principled approach: (i) the analysis should be based on a plausible assumption for the Missing Data mechanism, i.e. whether the probability that Data are Missing is independent of or dependent on the observed and/or unobserved values; (ii) the method chosen for the base-case should fit with the assumed mechanism; and (iii) sensitivity analysis should be conducted to explore to what extent the results change with the assumption made. This approach is implemented in three stages, which are described in detail: (1) descriptive analysis to inform the assumption on the Missing Data mechanism; (2) how to choose between alternative methods given their underlying assumptions; and (3) methods for sensitivity analysis. The case study illustrates how to apply this approach in practice, including software code. The article concludes with recommendations for practice and suggestions for future research.

  • multiple imputation using chained equations issues and guidance for practice
    Statistics in Medicine, 2011
    Co-Authors: Ian R. White, Patrick Royston, Angela M Wood
    Abstract:

    Multiple imputation by chained equations is a flexible and practical approach to Handling Missing Data. We describe the principles of the method and show how to impute categorical and quantitative variables, including skewed variables. We give guidance on how to specify the imputation model and how many imputations are needed. We describe the practical analysis of multiply imputed Data, including model building and model checking. We stress the limitations of the method and discuss the possible pitfalls. We illustrate the ideas using a Data set in mental health, giving Stata code fragments. Copyright © 2010 John Wiley & Sons, Ltd.

Faraj Bashir - One of the best experts on this subject based on the ideXlab platform.

  • Handling Missing Data in multivariate time series using a vector autoregressive model-imputation (VAR-IM) algorithm
    Neurocomputing, 2017
    Co-Authors: Faraj Bashir, Hua Liang Wei
    Abstract:

    Imputing Missing Data from a multivariate time series Dataset remains a challenging problem. There is an abundance of research on using various techniques to impute Missing, biased, or corrupted values to a Dataset. While a great amount of work has been done in this field, most imputing methodologies are centered about a specific application, typically involving static Data analysis and simple time series modelling. However, these approaches fall short of desired goals when the Data originates from a multivariate time series. The objective of this paper is to introduce a new algorithm for Handling Missing Data from multivariate time series Datasets. This new approach is based on a vector autoregressive (VAR) model by combining an expectation and minimization (EM) algorithm with the prediction error minimization (PEM) method. The new algorithm is called a vector autoregressive imputation method (VAR-IM). A description of the algorithm is presented and a case study was accomplished using the VAR-IM. The case study was applied to a real-world Data set involving electrocardiogram (ECG) Data. The VAR-IM method was compared with both traditional methods list wise deletion and linear regression substitution; and modern methods Multivariate Auto-Regressive State-Space (MARSS) and expectation maximization algorithm (EM). Generally, the VAR-IM method achieved significant improvement of the imputation tasks as compared with the other two methods. Although an improvement, a summary of the limitations and restrictions when using VAR-IM is presented.

  • Handling Missing Data in multivariate time series using a vector autoregressive model based imputation var im algorithm part i var im algorithm versus traditional methods
    Mediterranean Conference on Control and Automation, 2016
    Co-Authors: Faraj Bashir, Hua Liang Wei
    Abstract:

    Given an observed Data set, there are different methods that can be used to impute Missing Data. While excellent work has been done in this field, most available approaches are focused on some particular applications, such as static Data and univariate time series. The primary aim of the two papers Part I VAR-IM algorithm v.s. traditional methods and Part II VAR-IM algorithm v.s. modern algorithms — is to introduce an algorithm for Handling Missing Data in multivariate time series based on vector autoregressive (VAR) model by combining an expectation and minimization (EM) algorithm with the prediction error minimization (PEM) method. In the first part, we conduct two cases studies (one for simulation Data and another for real ECG Data) to compare the proposed algorithm with three traditional methods for imputing Missing Data: Mean substitution, list-wise deletion and linear regression substitution. In the second part, the proposed method is compared with more powerful modern techniques: MARRS Package, nearest neighbour, and the full information maximum likelihood (FIML) method. Furthermore, we demonstrate the use of the proposed method together with an empirical example of multivariate time series to ECG Data and discuss its advantages and limitations.

Jonathan N Grauer - One of the best experts on this subject based on the ideXlab platform.

  • Missing Data may lead to changes in hip fracture Database studies a study of the american college of surgeons national surgical quality improvement program
    Journal of Bone and Joint Surgery-british Volume, 2018
    Co-Authors: Bryce A Basques, Ryan P Mclynn, Adam M Lukasiewicz, Andre M Samuel, Daniel D Bohl, Jonathan N Grauer
    Abstract:

    AimsThe aims of this study were to characterize the frequency of Missing Data in the National Surgical Quality Improvement Program (NSQIP) Database and to determine how Missing Data can influence the results of studies dealing with elderly patients with a fracture of the hip.Patients and MethodsPatients who underwent surgery for a fracture of the hip between 2005 and 2013 were identified from the NSQIP Database and the percentage of Missing Data was noted for demographics, comorbidities and laboratory values. These variables were tested for association with ‘any adverse event’ using multivariate regressions based on common ways of Handling Missing Data.ResultsA total of 26 066 patients were identified. The rate of Missing Data was up to 77.9% for many variables. Multivariate regressions comparing three methods of Handling Missing Data found different risk factors for postoperative adverse events. Only seven of 35 identified risk factors (20%) were common to all three analyses.ConclusionMissing Data is an ...

Waddah B Alrefaie - One of the best experts on this subject based on the ideXlab platform.

  • Missing Data and interpretation of cancer surgery outcomes at the american college of surgeons national surgical quality improvement program
    Journal of The American College of Surgeons, 2011
    Co-Authors: Helen M Parsons, William G Henderson, Jeanette Y Ziegenfuss, Michael Davern, Waddah B Alrefaie
    Abstract:

    Background The American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) has become an important surgical quality program in the United States, yet few studies describe their methods for Handling Missing Data. Our study examines the impact of Missing Data on predictive models for short-term operative outcomes after cancer surgery in the ACS NSQIP Database. Study Design We identified 97,230 patients who underwent oncologic resections for neoplasms in the 2005–2009 ACS NSQIP. We used multivariable logistic regression to assess the impact of pre-, intra-, and postoperative factors on short-term operative outcomes by type of procedure where Missing values were included as a variable category, excluded, and imputed. Results A large proportion (72.8%) of patients had one or more Missing pre-, intra-, or postoperative characteristics, particularly preoperative laboratory values. Missing Data were more frequent in healthier patients and those undergoing lower-risk procedures. Although Data were not Missing at random, the impact of preoperative risk factors on adverse operative outcomes after cancer surgery was similar across methods for Handling Missing Data. However, analytic approaches using only patients with complete or imputed information risk basing the analysis on a potentially nonrepresentative sample. Conclusions Missing Data present challenges to interpreting predictors of short-term operative outcomes after cancer surgery at ACS NSQIP hospitals. Similar to best practices for other Data sets, this study highlights the importance of using Missing values carefully when using ACS NSQIP. Given its potential to introduce bias, the approach to Handling Missing values should be detailed in future ACS NSQIP studies.