Unbiased Estimator

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 306 Experts worldwide ranked by ideXlab platform

Yves Grandvalet - One of the best experts on this subject based on the ideXlab platform.

  • no Unbiased Estimator of the variance of k fold cross validation
    Journal of Machine Learning Research, 2004
    Co-Authors: Yoshua Bengio, Yves Grandvalet
    Abstract:

    Most machine learning researchers perform quantitative experiments to estimate generalization error and compare the performance of different algorithms (in particular, their proposed algorithm). In order to be able to draw statistically convincing conclusions, it is important to estimate the uncertainty of such estimates. This paper studies the very commonly used K-fold cross-validation Estimator of generalization performance. The main theorem shows that there exists no universal (valid under all distributions) Unbiased Estimator of the variance of K-fold cross-validation. The analysis that accompanies this result is based on the eigen-decomposition of the covariance matrix of errors, which has only three different eigenvalues corresponding to three degrees of freedom of the matrix and three components of the total variance. This analysis helps to better understand the nature of the problem and how it can make naive Estimators (that don't take into account the error correlations due to the overlap between training and test sets) grossly underestimate variance. This is confirmed by numerical experiments in which the three components of the variance are compared when the difficulty of the learning problem and the number of folds are varied.

  • no Unbiased Estimator of the variance of k fold cross validation
    Neural Information Processing Systems, 2003
    Co-Authors: Yoshua Bengio, Yves Grandvalet
    Abstract:

    Most machine learning researchers perform quantitative experiments to estimate generalization error and compare algorithm performances. In order to draw statistically convincing conclusions, it is important to estimate the uncertainty of such estimates. This paper studies the estimation of uncertainty around the K-fold cross-validation Estimator. The main theorem shows that there exists no universal Unbiased Estimator of the variance of K-fold cross-validation. An analysis based on the eigende-composition of the covariance matrix of errors helps to better understand the nature of the problem and shows that naive Estimators may grossly underestimate variance, as confirmed by numerical experiments.

  • no Unbiased Estimator of the variance of k fold cross validation
    Research Papers in Economics, 2003
    Co-Authors: Yoshua Bengio, Yves Grandvalet
    Abstract:

    In statistical machine learning, the standard measure of accuracy for models is the prediction error, i.e. the expected loss on future examples. When the data distribution is unknown, it cannot be computed but several resampling methods, such as K-fold cross-validation can be used to obtain an Unbiased Estimator of prediction error. However, to compare learning algorithms one needs to also estimate the uncertainty around the cross-validation Estimator, which is important because it can be very large. However, the usual variance estimates for means of independent samples cannot be used because of the reuse of the data used to form the cross-validation Estimator. The main result of this paper is that there is no universal (distribution independent) Unbiased Estimator of the variance of the K-fold cross-validation Estimator, based only on the empirical results of the error measurements obtained through the cross-validation procedure. The analysis provides a theoretical understanding showing the difficulty of this estimation. These results generalize to other resampling methods, as long as data are reused for training or testing. L'erreur de prediction, donc la perte attendue sur des donnees futures, est la mesure standard pour la qualite des modeles d'apprentissage statistique. Quand la distribution des donnees est inconnue, cette erreur ne peut etre calculee mais plusieurs methodes de reechantillonnage, comme la validation croisee, peuvent etre utilisees pour obtenir un estimateur non-biaise de l'erreur de prediction. Cependant pour comparer des algorithmes d'apprentissage, il faut aussi estimer l'incertitude autour de cet estimateur d'erreur future, car cette incertitude peut etre tres grande. Cependant, les estimateurs ordinaires de variance d'une moyenne pour des echantillons independants ne peuvent etre utilises a cause du recoupement des ensembles d'apprentissage utilises pour effectuer la validation croisee. Le resultat principal de cet article est qu'il n'existe pas d'estimateur non-biaise universel (independant de la distribution) de la variance de la validation croisee, en se basant sur les mesures d'erreur faites durant la validation croisee. L'analyse fournit une meilleure comprehension de la difficulte d'estimer l'incertitude autour de la validation croisee. Ces resultats se generalisent a d'autres methodes de reechantillonnage pour lesquelles des donnees sont reutilisees pour l'apprentissage ou le test.

Yoshua Bengio - One of the best experts on this subject based on the ideXlab platform.

  • no Unbiased Estimator of the variance of k fold cross validation
    Journal of Machine Learning Research, 2004
    Co-Authors: Yoshua Bengio, Yves Grandvalet
    Abstract:

    Most machine learning researchers perform quantitative experiments to estimate generalization error and compare the performance of different algorithms (in particular, their proposed algorithm). In order to be able to draw statistically convincing conclusions, it is important to estimate the uncertainty of such estimates. This paper studies the very commonly used K-fold cross-validation Estimator of generalization performance. The main theorem shows that there exists no universal (valid under all distributions) Unbiased Estimator of the variance of K-fold cross-validation. The analysis that accompanies this result is based on the eigen-decomposition of the covariance matrix of errors, which has only three different eigenvalues corresponding to three degrees of freedom of the matrix and three components of the total variance. This analysis helps to better understand the nature of the problem and how it can make naive Estimators (that don't take into account the error correlations due to the overlap between training and test sets) grossly underestimate variance. This is confirmed by numerical experiments in which the three components of the variance are compared when the difficulty of the learning problem and the number of folds are varied.

  • no Unbiased Estimator of the variance of k fold cross validation
    Neural Information Processing Systems, 2003
    Co-Authors: Yoshua Bengio, Yves Grandvalet
    Abstract:

    Most machine learning researchers perform quantitative experiments to estimate generalization error and compare algorithm performances. In order to draw statistically convincing conclusions, it is important to estimate the uncertainty of such estimates. This paper studies the estimation of uncertainty around the K-fold cross-validation Estimator. The main theorem shows that there exists no universal Unbiased Estimator of the variance of K-fold cross-validation. An analysis based on the eigende-composition of the covariance matrix of errors helps to better understand the nature of the problem and shows that naive Estimators may grossly underestimate variance, as confirmed by numerical experiments.

  • no Unbiased Estimator of the variance of k fold cross validation
    Research Papers in Economics, 2003
    Co-Authors: Yoshua Bengio, Yves Grandvalet
    Abstract:

    In statistical machine learning, the standard measure of accuracy for models is the prediction error, i.e. the expected loss on future examples. When the data distribution is unknown, it cannot be computed but several resampling methods, such as K-fold cross-validation can be used to obtain an Unbiased Estimator of prediction error. However, to compare learning algorithms one needs to also estimate the uncertainty around the cross-validation Estimator, which is important because it can be very large. However, the usual variance estimates for means of independent samples cannot be used because of the reuse of the data used to form the cross-validation Estimator. The main result of this paper is that there is no universal (distribution independent) Unbiased Estimator of the variance of the K-fold cross-validation Estimator, based only on the empirical results of the error measurements obtained through the cross-validation procedure. The analysis provides a theoretical understanding showing the difficulty of this estimation. These results generalize to other resampling methods, as long as data are reused for training or testing. L'erreur de prediction, donc la perte attendue sur des donnees futures, est la mesure standard pour la qualite des modeles d'apprentissage statistique. Quand la distribution des donnees est inconnue, cette erreur ne peut etre calculee mais plusieurs methodes de reechantillonnage, comme la validation croisee, peuvent etre utilisees pour obtenir un estimateur non-biaise de l'erreur de prediction. Cependant pour comparer des algorithmes d'apprentissage, il faut aussi estimer l'incertitude autour de cet estimateur d'erreur future, car cette incertitude peut etre tres grande. Cependant, les estimateurs ordinaires de variance d'une moyenne pour des echantillons independants ne peuvent etre utilises a cause du recoupement des ensembles d'apprentissage utilises pour effectuer la validation croisee. Le resultat principal de cet article est qu'il n'existe pas d'estimateur non-biaise universel (independant de la distribution) de la variance de la validation croisee, en se basant sur les mesures d'erreur faites durant la validation croisee. L'analyse fournit une meilleure comprehension de la difficulte d'estimer l'incertitude autour de la validation croisee. Ces resultats se generalisent a d'autres methodes de reechantillonnage pour lesquelles des donnees sont reutilisees pour l'apprentissage ou le test.

Tetsuya Akita - One of the best experts on this subject based on the ideXlab platform.

  • nearly Unbiased Estimator of contemporary effective mother size using within cohort maternal sibling pairs incorporating parental and nonparental reproductive variations
    Heredity, 2020
    Co-Authors: Tetsuya Akita
    Abstract:

    In this study, we developed a nearly Unbiased Estimator of contemporary effective mother size in a population, which is based on a known maternal half-sibling relationship found within the same cohort. Our method allows for variance of the average number of offspring per mother (i.e., parental variation, such as age-specific fecundity) and variance of the number of offspring among mothers with identical reproductive potential (i.e., nonparental variation, such as family-correlated survivorship). We also developed Estimators of the variance and coefficient of variation of contemporary effective mother size and qualitatively evaluated the performance of the Estimators by running an individual-based model. Our results provide guidance for (i) a sample size to ensure the required accuracy and precision when the order of effective mother size is available and (ii) a degree of uncertainty regarding the estimated effective mother size when information about the size is unavailable. To the best of our knowledge, this is the first report to demonstrate the derivation of a nearly Unbiased Estimator of effective population size; however, its current application is limited to effective mother size and situations, in which the sample size is not particularly small and maternal half-sibling relationships can be detected without error. The results of this study demonstrate the usefulness of a sibship assignment method for estimating effective population size; in addition, they have the potential to greatly widen the scope of genetic monitoring, especially in the situation of small sample size.

  • nearly Unbiased Estimator of contemporary effective population size using within cohort sibling pairs incorporating parental and non parental reproductive variations
    bioRxiv, 2019
    Co-Authors: Tetsuya Akita
    Abstract:

    In this study, we developed a nearly Unbiased Estimator of contemporary effective mother size in a population, which is based on a known maternal half-sibling relationship found within the same cohort. Our method allows for variance of the average number of offspring per mother (i.e., parental variation, such as age-specific fecundity) and variance of the number of offspring among mothers with identical reproductive potential (i.e., non-parental variation, such as family-correlated survivorship). We also developed Estimators of the variance and coefficient variation of contemporary effective mother size and qualitatively evaluated the performance of the Estimators by running an individual-based model. Our results provide guidance for (i) a sample size to ensure the required accuracy and precision when the order of effective mother size is available and (ii) a degree of uncertainty regarding the estimated effective mother size when information about the size is unavailable. To the best of our knowledge, this is the first report to demonstrate the derivation of a nearly Unbiased Estimator of effective population size; however, its current application is limited to effective mother size and situations in which the sample size is not particularly small and maternal half-sibling relationships can be detected without error. The results of this study demonstrate the usefulness of a sibship assignment method for estimating effective population size; in addition, they have the potential to greatly widen the scope of genetic monitoring.

Klausrobert Muller - One of the best experts on this subject based on the ideXlab platform.

  • approximating the best linear Unbiased Estimator of non gaussian signals with gaussian noise
    The IEICE transactions on information and systems, 2008
    Co-Authors: Masashi Sugiyama, Motoaki Kawanabe, Gilles Blanchard, Klausrobert Muller
    Abstract:

    Obtaining the best linear Unbiased Estimator (BLUE) of noisy signals is a traditional but powerful approach to noise reduction. Explicitly computing the BLUE usually requires the prior knowledge of the noise covariance matrix and the subspace to which the true signal belongs. However, such prior knowledge is often unavailable in reality, which prevents us from applying the BLUE to real-world problems. To cope with this problem, we give a practical procedure for approximating the BLUE without such prior knowledge. Our additional assumption is that the true signal follows a non-Gaussian distribution while the noise is Gaussian.

  • obtaining the best linear Unbiased Estimator of noisy signals by non gaussian component analysis
    International Conference on Acoustics Speech and Signal Processing, 2006
    Co-Authors: Masashi Sugiyama, Motoaki Kawanabe, Gilles Blanchard, Klausrobert Muller, V Spokiny
    Abstract:

    Obtaining the best linear Unbiased Estimator (BLUE) of noisy signals is a traditional but powerful approach to noise reduction Explicitly computing BLUE usually requires the prior knowledge of the subspace to which the true signal belongs and the noise covariance matrix. However, such prior knowledge is often unavailable in reality, which prevents us from applying BLUE to real-world problems. In this paper, we therefore give a method for obtaining BLUE without such prior knowledge. Our additional assumption is that the true signal follows a non-Gaussian distribution while the noise is Gaussian.

  • ICASSP (3) - Obtaining the Best Linear Unbiased Estimator of Noisy Signals by Non-Gaussian Component Analysis
    2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 1
    Co-Authors: Masashi Sugiyama, Motoaki Kawanabe, Gilles Blanchard, V Spokiny, Klausrobert Muller
    Abstract:

    Obtaining the best linear Unbiased Estimator (BLUE) of noisy signals is a traditional but powerful approach to noise reduction Explicitly computing BLUE usually requires the prior knowledge of the subspace to which the true signal belongs and the noise covariance matrix. However, such prior knowledge is often unavailable in reality, which prevents us from applying BLUE to real-world problems. In this paper, we therefore give a method for obtaining BLUE without such prior knowledge. Our additional assumption is that the true signal follows a non-Gaussian distribution while the noise is Gaussian.

Virendra Kishore Srivastava - One of the best experts on this subject based on the ideXlab platform.

  • An Unbiased Estimator of the Covariance Matrix of the Mixed Regression Estimator
    Journal of the American Statistical Association, 1991
    Co-Authors: David E. A. Giles, Virendra Kishore Srivastava
    Abstract:

    Abstract This article derives an Unbiased Estimator of the covariance matrix of the “mixed regression” Estimator suggested by Theil and Goldberger for combining prior information with the sample information in regression analysis. This derivation facilitates the construction of finite-sample standard errors for the mixed Estimators of the individual regression coefficients. Comparisons are made between the Unbiased covariance Estimator, the conventional consistent Estimator based on the generalized least squares formula, and a simple modification of the latter, which is found to approximate the Unbiased Estimator well in practical situations.