Sample Mean

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 565404 Experts worldwide ranked by ideXlab platform

Dominique Lord - One of the best experts on this subject based on the ideXlab platform.

  • bias properties of bayesian statistics in finite mixture of negative binomial regression models in crash data analysis
    Accident Analysis & Prevention, 2010
    Co-Authors: Byungjung Park, Dominique Lord, Jeffrey D Hart
    Abstract:

    Factors that cause heterogeneity in crash data are often unknown to researchers and failure to accommodate such heterogeneity in statistical models can undermine the validity of empirical results. A recently proposed finite mixture for the negative binomial regression model has shown a potential advantage in addressing the unobserved heterogeneity as well as providing useful information about features of the population under study. Despite its usefulness, however, no study has been found to examine the performance of this finite mixture under various conditions of Sample sizes and Sample-Mean values that are common in crash data analysis. This study investigated the bias associated with the Bayesian summary statistics (posterior Mean and median) of dispersion parameters in the two-component finite mixture of negative binomial regression models. A simulation study was conducted using various Sample sizes under different Sample-Mean values. Two prior specifications (non-informative and weakly-informative) on the dispersion parameter were also compared. The results showed that the posterior Mean using the non-informative prior exhibited a high bias for the dispersion parameter and should be avoided when the dataset contains less than 2,000 observations (even for high Sample-Mean values). The posterior median showed much better bias properties, particularly at small Sample sizes and small Sample Means. However, as the Sample size increases, the posterior median using the non-informative prior also began to exhibit an upward-bias trend. In such cases, the posterior Mean or median with the weakly-informative prior provided smaller bias. Based on simulation results, guidelines about the choice of priors and the summary statistics to use are presented for different Sample sizes and Sample-Mean values.

  • examining application of aggregated and disaggregated poisson gamma models subjected to low Sample Mean bias
    Transportation Research Record, 2009
    Co-Authors: Dominique Lord, Maneesh Mahlawat
    Abstract:

    Two general classes of models have been proposed for modeling crash data: disaggregated (both with and without time trend) and aggregated models. Poisson-gamma models have traditionally been used under both of these model classes. As documented in previous studies, data sets characterized by small Sample size and low Mean values can significantly affect the performance of Poisson-gamma models, particularly those related to the estimation of the inverse dispersion parameter. Thus, guidance is needed on when to use aggregated models instead of disaggregated models as a function of the Sample size and the Sample Mean value. The objective of this study was to estimate the conditions in which aggregated models (with a higher Mean but a smaller Sample size) could provide a more reliable estimate of the inverse dispersion parameter than disaggregated models (with a lower Sample Mean value but a larger Sample size) or vice versa. To accomplish this objective, several simulation runs were performed for different v...

  • adjustment for maximum likelihood estimate of negative binomial dispersion parameter
    Transportation Research Record, 2008
    Co-Authors: Byungjung Park, Dominique Lord
    Abstract:

    The negative binomial (NB) (or Poisson-gamma) model has been used extensively by highway safety analysts because it can accommodate the overdispersion often exhibited in crash data. However, it has been reported in the literature that the maximum likelihood estimate of the dispersion parameter of NB models can be significantly affected when the data are characterized by small Sample size and low Sample Mean. Given the important roles of the dispersion parameter in various types of highway safety analyses, there is a need to determine whether the bias could be potentially corrected or minimized. The objectives of this study are to explore whether a systematic relationship exists between the estimated and true dispersion parameters, determine the bias as a function of the Sample size and Sample Mean, and develop a procedure for correcting the bias caused by these two conditions. For this purpose, simulated data were used to derive the relationship under the various combinations of Sample Mean, dispersion pa...

  • effects of low Sample Mean values and small Sample size on the estimation of the fixed dispersion parameter of poisson gamma models for modeling motor vehicle crashes a bayesian perspective
    Safety Science, 2008
    Co-Authors: Dominique Lord, Luis F Mirandamoreno
    Abstract:

    Abstract There has been considerable research conducted on the development of statistical models for predicting motor vehicle crashes on highway facilities. Over the last few years, there has been a significant increase in the application hierarchical Bayes methods for modeling motor vehicle crash data. Whether the inferences are estimated using classical or Bayesian methods, the most common probabilistic structure used for modeling this type of data remains the traditional Poisson-gamma (or Negative Binomial) model. Crash data collected for highway safety studies often have the unusual attributes of being characterized by low Sample Mean values and, due to the prohibitive costs of collecting data, small Sample sizes. Previous studies have shown that the dispersion parameter of Poisson-gamma models can be seriously mis-estimated when the models are estimated using the maximum likelihood estimation (MLE) method for these extreme conditions. Despite important work done on this topic for the MLE, nobody has so far examined how low Sample Mean values and small Sample sizes affect the posterior Mean of the dispersion parameter of Poisson-gamma models estimated using the hierarchical Bayes method. The inverse dispersion parameter plays an important role in various types of highway safety studies. It is therefore vital to determine the conditions in which the inverse dispersion parameter may be mis-estimated for this category of models. To accomplish the objectives of this study, a simulation framework is developed to generate data from the Poisson-gamma distributions using different values describing the Mean, the dispersion parameter, the Sample size, and the prior specification. Vague and non-vague prior specifications are tested for determining the magnitude of the biases introduced by low Sample Mean values and small Sample sizes. A series of datasets are also simulated from the Poisson-lognormal distributions, in the light of recent work done by statisticians on this mixed distribution. The study shows that a dataset characterized by a low Sample Mean combined with a small Sample size can seriously affect the estimation of the posterior Mean of the dispersion parameter when a vague prior specification is used to characterize the gamma hyper-parameter. The risk of a mis-estimated posterior Mean can be greatly minimized when an appropriate non-vague prior distribution is used. Finally, the study shows that Poisson-lognormal models are recommended over Poisson-gamma models when assuming vague priors and whenever crash data characterized by low Sample Mean values are used for developing crash prediction models.

  • modeling motor vehicle crashes using poisson gamma models examining the effects of low Sample Mean values and small Sample size on the estimation of the fixed dispersion parameter
    Accident Analysis & Prevention, 2006
    Co-Authors: Dominique Lord
    Abstract:

    There has been considerable research conducted on the development of statistical models for predicting crashes on highway facilities. Despite numerous advancements made for improving the estimation tools of statistical models, the most common probabilistic structure used for modeling motor vehicle crashes remains the traditional Poisson and Poisson-gamma (or Negative Binomial) distribution; when crash data exhibit over-dispersion, the Poisson-gamma model is usually the model of choice most favored by transportation safety modelers. Crash data collected for safety studies often have the unusual attributes of being characterized by low Sample Mean values. Studies have shown that the goodness-of-fit of statistical models produced from such datasets can be significantly affected. This issue has been defined as the "low Mean problem" (LMP). Despite recent developments on methods to circumvent the LMP and test the goodness-of-fit of models developed using such datasets, no work has so far examined how the LMP affects the fixed dispersion parameter of Poisson-gamma models used for modeling motor vehicle crashes. The dispersion parameter plays an important role in many types of safety studies and should, therefore, be reliably estimated. The primary objective of this research project was to verify whether the LMP affects the estimation of the dispersion parameter and, if it is, to determine the magnitude of the problem. The secondary objective consisted of determining the effects of an unreliably estimated dispersion parameter on common analyses performed in highway safety studies. To accomplish the objectives of the study, a series of Poisson-gamma distributions were simulated using different values describing the Mean, the dispersion parameter, and the Sample size. Three estimators commonly used by transportation safety modelers for estimating the dispersion parameter of Poisson-gamma models were evaluated: the method of moments, the weighted regression, and the maximum likelihood method. In an attempt to complement the outcome of the simulation study, Poisson-gamma models were fitted to crash data collected in Toronto, Ont. characterized by a low Sample Mean and small Sample size. The study shows that a low Sample Mean combined with a small Sample size can seriously affect the estimation of the dispersion parameter, no matter which estimator is used within the estimation process. The probability the dispersion parameter becomes unreliably estimated increases significantly as the Sample Mean and Sample size decrease. Consequently, the results show that an unreliably estimated dispersion parameter can significantly undermine empirical Bayes (EB) estimates as well as the estimation of confidence intervals for the gamma Mean and predicted response. The paper ends with recommendations about minimizing the likelihood of producing Poisson-gamma models with an unreliable dispersion parameter for modeling motor vehicle crashes.

Tiejun Tong - One of the best experts on this subject based on the ideXlab platform.

  • optimally estimating the Sample standard deviation from the five number summary
    Research Synthesis Methods, 2020
    Co-Authors: Jiandong Shi, Dehui Luo, Hong Weng, Xiantao Zeng, Lu Lin, Haitao Chu, Tiejun Tong
    Abstract:

    When reporting the results of clinical studies, some researchers may choose the five-number summary (including the Sample median, the first and third quartiles, and the minimum and maximum values) rather than the Sample Mean and standard deviation (SD), particularly for skewed data. For these studies, when included in a meta-analysis, it is often desired to convert the five-number summary back to the Sample Mean and SD. For this purpose, several methods have been proposed in the recent literature and they are increasingly used nowadays. In this article, we propose to further advance the literature by developing a smoothly weighted estimator for the Sample SD that fully utilizes the Sample size information. For ease of implementation, we also derive an approximation formula for the optimal weight, as well as a shortcut formula for the Sample SD. Numerical results show that our new estimator provides a more accurate estimate for normal data and also performs favorably for non-normal data. Together with the optimal Sample Mean estimator in Luo et al., our new methods have dramatically improved the existing methods for data transformation, and they are capable to serve as "rules of thumb" in meta-analysis for studies reported with the five-number summary. Finally for practical use, an Excel spreadsheet and an online calculator are also provided for implementing our optimal estimators.

  • optimally estimating the Sample standard deviation from the five number summary
    arXiv: Methodology, 2020
    Co-Authors: Jiandong Shi, Dehui Luo, Hong Weng, Xiantao Zeng, Lu Lin, Haitao Chu, Tiejun Tong
    Abstract:

    When reporting the results of clinical studies, some researchers may choose the five-number summary (including the Sample median, the first and third quartiles, and the minimum and maximum values) rather than the Sample Mean and standard deviation, particularly for skewed data. For these studies, when included in a meta-analysis, it is often desired to convert the five-number summary back to the Sample Mean and standard deviation. For this purpose, several methods have been proposed in the recent literature and they are increasingly used nowadays. In this paper, we propose to further advance the literature by developing a smoothly weighted estimator for the Sample standard deviation that fully utilizes the Sample size information. For ease of implementation, we also derive an approximation formula for the optimal weight, as well as a shortcut formula for the Sample standard deviation. Numerical results show that our new estimator provides a more accurate estimate for normal data and also performs favorably for non-normal data. Together with the optimal Sample Mean estimator in Luo et al., our new methods have dramatically improved the existing methods for data transformation, and they are capable to serve as "rules of thumb" in meta-analysis for studies reported with the five-number summary. Finally for practical use, an Excel spreadsheet and an online calculator are also provided for implementing our optimal estimators.

  • optimally estimating the Sample Mean and standard deviation from the five number summary
    2020
    Co-Authors: Jiandong Shi, Dehui Luo, Hong Weng, Xiantao Zeng, Lu Lin, Haitao Chu, Tiejun Tong
    Abstract:

    When reporting the results of clinical studies, some researchers may choose the five-number summary (including the Sample median, the first and third quartiles, and the minimum and maximum values) rather than the Sample Mean and standard deviation, particularly for skewed data. For these studies, when included in a meta-analysis, it is often desired to convert the five-number summary back to the Sample Mean and standard deviation. For this purpose, several methods have been proposed in the recent literature and they are increasingly used nowadays. In this paper, we propose to further advance the literature by developing a smoothly weighted estimator for the Sample standard deviation that fully utilizes the Sample size information. For ease of implementation, we also derive an approximation formula for the optimal weight, as well as a shortcut formula for the Sample standard deviation. Numerical results show that our new estimator provides a more accurate estimate for normal data and also performs favorably for non-normal data. Together with the optimal Sample Mean estimator in Luo et al., our new methods have dramatically improved the existing methods for data transformation, and they are capable to serve as "rules of thumb" in meta-analysis for studies reported with the five-number summary. Finally for practical use, an Excel spreadsheet and an online calculator are also provided for implementing our optimal estimators.

  • optimally estimating the Sample Mean from the Sample size median mid range and or mid quartile range
    Statistical Methods in Medical Research, 2018
    Co-Authors: Dehui Luo, Xiang Wan, Jiming Liu, Tiejun Tong
    Abstract:

    The era of big data is coming, and evidence-based medicine is attracting increasing attention to improve decision making in medical practice via integrating evidence from well designed and conducted clinical research. Meta-analysis is a statistical technique widely used in evidence-based medicine for analytically combining the findings from independent clinical trials to provide an overall estimation of a treatment effectiveness. The Sample Mean and standard deviation are two commonly used statistics in meta-analysis but some trials use the median, the minimum and maximum values, or sometimes the first and third quartiles to report the results. Thus, to pool results in a consistent format, researchers need to transform those information back to the Sample Mean and standard deviation. In this article, we investigate the optimal estimation of the Sample Mean for meta-analysis from both theoretical and empirical perspectives. A major drawback in the literature is that the Sample size, needless to say its imp...

  • how to estimate the Sample Mean and standard deviation from the five number summary
    arXiv: Methodology, 2018
    Co-Authors: Jiandong Shi, Dehui Luo, Hong Weng, Xiantao Zeng, Lu Lin, Tiejun Tong
    Abstract:

    In some clinical studies, researchers may report the five number summary (including the Sample median, the first and third quartiles, and the minimum and maximum values) rather than the Sample Mean and standard deviation. To conduct meta-analysis for pooling studies, one needs to first estimate the Sample Mean and standard deviation from the five number summary. A number of studies have been proposed in the recent literature to solve this problem. However, none of the existing estimators for the standard deviation is satisfactory for practical use. After a brief review of the existing literature, we point out that Wan et al.'s method (BMC Med Res Methodol 14:135, 2014) has a serious limitation in estimating the standard deviation from the five number summary. To improve it, we propose a smoothly weighted estimator by incorporating the Sample size information and derive the optimal weight for the new estimator. For ease of implementation, we also provide an approximation formula of the optimal weight and a shortcut formula for estimating the standard deviation from the five number summary. The performance of the proposed estimator is evaluated through two simulation studies. In comparison with Wan et al.'s estimator, our new estimator provides a more accurate estimate for normal data and performs favorably for non-normal data. In real data analysis, our new method is also able to provide a more accurate estimate of the true Sample standard deviation than the existing method. In this paper, we propose an optimal estimator of the standard deviation from the five number summary. Together with the optimal Mean estimator in Luo et al. (Stat Methods Med Res, in press, 2017), our new methods have improved the existing literature and will make a solid contribution to meta-analysis and evidence-based medicine.

Secundino Lopez - One of the best experts on this subject based on the ideXlab platform.

  • a strategy for modelling heavy tailed greenhouse gases ghg data using the generalised extreme value distribution are we overestimating ghg flux using the Sample Mean
    Atmospheric Environment, 2020
    Co-Authors: M S Dhanoa, Aranzazu Louro, L M Cardenas, Anita Shepherd, Ruth Sanderson, Secundino Lopez
    Abstract:

    Abstract In this study, we draw up a strategy for analysis of greenhouse gas (GHG) field data. The distribution of GHG flux data generally exhibits excessive skewness and kurtosis. This results in a heavy tailed distribution that is much longer than the tail of a log-normal distribution or outlier induced skewness. The generalised extreme value (GEV) distribution is well-suited to model such data. We evaluated GEV as a model for the analysis and a Means of extraction of a robust average of carbon dioxide (CO2) and nitrous oxide (N2O) flux data measured in an agricultural field. The option of transforming CO2 flux data to the Box-Cox scale in order to make the distribution normal was also investigated. The results showed that average CO2 estimates from GEV are less affected by data in the long tail compared to the Sample Mean. The data for N2O flux were much more complex than CO2 flux data due to the presence of negative fluxes. The estimate of the average value from GEV was much more consistent with maximum data frequency position. The analysis of GEV, which considers the effects of hot-spot-like observations, suggests that Sample Means and log-Means may overestimate GHG fluxes from agricultural fields. In this study, the arithmetic CO2 Sample Mean of 65.6 (Mean log-scale 65.9) kg CO2–C ha−1 d−1 was reduced to GEV Mean of 60.1 kg CO2–C ha−1 d−1. The arithmetic N2O Sample Mean of 1.038 (Mean log-scale 1.038) kg N2O–N ha−1 d−1 was substantially reduced to GEV Mean of 0.0157 kg N2O–N ha−1 d−1. Our analysis suggests that GHG data should be analysed assuming a GEV distribution of the data, including a Box-Cox transformation when negative data are observed, rather than only calculating basic log and log-normal summaries. Results of GHG studies may end up in national inventories. Thus, it is necessary and important to follow all procedures that contribute to minimise any bias in the data.

Andrea Benedetti - One of the best experts on this subject based on the ideXlab platform.

  • estimating the Sample Mean and standard deviation from commonly reported quantiles in meta analysis
    Statistical Methods in Medical Research, 2020
    Co-Authors: Sean Mcgrath, Xiaofei Zhao, Russell Steele, Brett D Thombs, Andrea Benedetti
    Abstract:

    Researchers increasingly use meta-analysis to synthesize the results of several studies in order to estimate a common effect. When the outcome variable is continuous, standard meta-analytic approaches assume that the primary studies report the Sample Mean and standard deviation of the outcome. However, when the outcome is skewed, authors sometimes summarize the data by reporting the Sample median and one or both of (i) the minimum and maximum values and (ii) the first and third quartiles, but do not report the Mean or standard deviation. To include these studies in meta-analysis, several methods have been developed to estimate the Sample Mean and standard deviation from the reported summary data. A major limitation of these widely used methods is that they assume that the outcome distribution is normal, which is unlikely to be tenable for studies reporting medians. We propose two novel approaches to estimate the Sample Mean and standard deviation when data are suspected to be non-normal. Our simulation results and empirical assessments show that the proposed methods often perform better than the existing methods when applied to non-normal data.

Byungjung Park - One of the best experts on this subject based on the ideXlab platform.

  • bias properties of bayesian statistics in finite mixture of negative binomial regression models in crash data analysis
    Accident Analysis & Prevention, 2010
    Co-Authors: Byungjung Park, Dominique Lord, Jeffrey D Hart
    Abstract:

    Factors that cause heterogeneity in crash data are often unknown to researchers and failure to accommodate such heterogeneity in statistical models can undermine the validity of empirical results. A recently proposed finite mixture for the negative binomial regression model has shown a potential advantage in addressing the unobserved heterogeneity as well as providing useful information about features of the population under study. Despite its usefulness, however, no study has been found to examine the performance of this finite mixture under various conditions of Sample sizes and Sample-Mean values that are common in crash data analysis. This study investigated the bias associated with the Bayesian summary statistics (posterior Mean and median) of dispersion parameters in the two-component finite mixture of negative binomial regression models. A simulation study was conducted using various Sample sizes under different Sample-Mean values. Two prior specifications (non-informative and weakly-informative) on the dispersion parameter were also compared. The results showed that the posterior Mean using the non-informative prior exhibited a high bias for the dispersion parameter and should be avoided when the dataset contains less than 2,000 observations (even for high Sample-Mean values). The posterior median showed much better bias properties, particularly at small Sample sizes and small Sample Means. However, as the Sample size increases, the posterior median using the non-informative prior also began to exhibit an upward-bias trend. In such cases, the posterior Mean or median with the weakly-informative prior provided smaller bias. Based on simulation results, guidelines about the choice of priors and the summary statistics to use are presented for different Sample sizes and Sample-Mean values.

  • adjustment for maximum likelihood estimate of negative binomial dispersion parameter
    Transportation Research Record, 2008
    Co-Authors: Byungjung Park, Dominique Lord
    Abstract:

    The negative binomial (NB) (or Poisson-gamma) model has been used extensively by highway safety analysts because it can accommodate the overdispersion often exhibited in crash data. However, it has been reported in the literature that the maximum likelihood estimate of the dispersion parameter of NB models can be significantly affected when the data are characterized by small Sample size and low Sample Mean. Given the important roles of the dispersion parameter in various types of highway safety analyses, there is a need to determine whether the bias could be potentially corrected or minimized. The objectives of this study are to explore whether a systematic relationship exists between the estimated and true dispersion parameters, determine the bias as a function of the Sample size and Sample Mean, and develop a procedure for correcting the bias caused by these two conditions. For this purpose, simulated data were used to derive the relationship under the various combinations of Sample Mean, dispersion pa...