Covariance Parameter

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 81 Experts worldwide ranked by ideXlab platform

Florian Gerber - One of the best experts on this subject based on the ideXlab platform.

  • parallel cross validation a scalable fitting method for gaussian process models
    Computational Statistics & Data Analysis, 2021
    Co-Authors: Florian Gerber, Douglas W Nychka
    Abstract:

    Gaussian process (GP) models are widely used to analyze spatially referenced data and to predict values at locations without observations. They are based on a statistical framework, which enables uncertainty quantification of the model structure and predictions. Both the evaluation of the likelihood and the prediction involve solving linear systems. Hence, the computational costs are large and limit the amount of data that can be handled. While there are many approximation strategies that lower the computational cost of GP models, they often provide sub-optimal support for the parallel computing capabilities of (high-performance) computing environments. To bridge this gap a parallelizable Parameter estimation and prediction method is presented. The key idea is to divide the spatial domain into overlapping subsets and to use cross-validation (CV) to estimate the Covariance Parameters in parallel. Although simulations show that CV is less effective for Parameter estimation than the maximum likelihood method, it is amenable to parallel computing and enables the handling of large datasets. Exploiting the screen effect for spatial prediction helps to arrive at a spatial analysis that is close to a global computation despite performing parallel computations on local regions. Simulation studies assess the accuracy of the Parameter estimates and predictions. The implementation shows good weak and strong parallel scaling properties. For illustration, an exponential Covariance model is fitted to a scientifically relevant canopy height dataset with 5 million observations. Using 512 processor cores in parallel brings the evaluation time of one Covariance Parameter configuration to 1.5 minutes.

  • Fast Covariance Parameter estimation of spatial Gaussian process models using neural networks
    arXiv: Machine Learning, 2020
    Co-Authors: Florian Gerber, Douglas Nychka
    Abstract:

    Gaussian processes (GPs) are a popular model for spatially referenced data and allow descriptive statements, predictions at new locations, and simulation of new fields. Often a few Parameters are sufficient to Parameterize the Covariance function, and maximum likelihood (ML) methods can be used to estimate these Parameters from data. ML methods, however, are computationally demanding. For example, in the case of local likelihood estimation, even fitting Covariance models on modest size windows can overwhelm typical computational resources for data analysis. This limitation motivates the idea of using neural network (NN) methods to approximate ML estimates. We train NNs to take moderate size spatial fields or variograms as input and return the range and noise-to-signal Covariance Parameters. Once trained, the NNs provide estimates with a similar accuracy compared to ML estimation and at a speedup by a factor of 100 or more. Although we focus on a specific Covariance estimation problem motivated by a climate science application, this work can be easily extended to other, more complex, spatial problems and provides a proof-of-concept for this use of machine learning in computational statistics.

François Bachoc - One of the best experts on this subject based on the ideXlab platform.

  • Finite-dimensional Gaussian approximation with linear inequality constraints
    SIAM ASA Journal on Uncertainty Quantification, 2018
    Co-Authors: Andrés F. López-lopera, François Bachoc, Nicolas Durrande, Olivier Roustant
    Abstract:

    Introducing inequality constraints in Gaussian process (GP) models can lead to more realistic uncertainties in learning a great variety of real-world problems. We consider the finite-dimensional Gaussian approach from Maatouk and Bay (2017) which can satisfy inequality conditions everywhere (either boundedness, monotonicity or convexity). Our contributions are threefold. First, we extend their approach in order to deal with general sets of linear inequalities. Second, we explore several Markov Chain Monte Carlo (MCMC) techniques to approximate the posterior distribution. Third, we investigate theoretical and numerical properties of the constrained likelihood for Covariance Parameter estimation. According to experiments on both artificial and real data, our full framework together with a Hamiltonian Monte Carlo-based sampler provides efficient results on both data fitting and uncertainty quantification.

  • Asymptotic analysis of Covariance Parameter estimation for Gaussian processes in the misspecified case
    arXiv: Statistics Theory, 2014
    Co-Authors: François Bachoc
    Abstract:

    In parametric estimation of Covariance function of Gaussian processes, it is often the case that the true Covariance function does not belong to the parametric set used for estimation. This situation is called the misspecified case. In this case, it has been shown that, for irregular spatial sampling of observation points, Cross Validation can yield smaller prediction errors than Maximum Likelihood. Motivated by this observation, we provide a general asymptotic analysis of the misspecified case, for independent and uniformly distributed observation points. We prove that the Maximum Likelihood estimator asymptotically minimizes a Kullback-Leibler divergence, within the misspecified parametric set, while Cross Validation asymptotically minimizes the integrated square prediction error. In a Monte Carlo simulation, we show that the Covariance Parameters estimated by Maximum Likelihood and Cross Validation, and the corresponding Kullback-Leibler divergences and integrated square prediction errors, can be strongly contrasting. On a more technical level, we provide new increasing-domain asymptotic results for independent and uniformly distributed observation points.

  • Asymptotic analysis of Covariance Parameter estimation for Gaussian processes in the misspecified case
    Journal of Multivariate Analysis, 2014
    Co-Authors: François Bachoc
    Abstract:

    In parametric estimation of Covariance function of Gaussian processes, it is often the case that the true Covariance function does not belong to the parametric set used for estimation. This situation is called the misspecified case. In this case, it has been observed that, for irregular spatial sampling of observation points, Cross Validation can yield smaller prediction errors than Maximum Likelihood. Motivated by this comparison, we provide a general asymptotic analysis of the misspecified case, for independent observation points with uniform distribution. We prove that the Maximum Likelihood estimator asymptotically minimizes a Kullback-Leibler divergence, within the misspecified parametric set, while Cross Validation asymptotically minimizes the integrated square prediction error. In a Monte Carlo simulation, we show that the Covariance Parameters estimated by Maximum Likelihood and Cross Validation, and the corresponding Kullback-Leibler divergences and integrated square prediction errors, can be strongly contrasting. On a more technical level, we provide new increasing-domain asymptotic results for the situation where the eigenvalues of the Covariance matrices involved are not upper bounded.

  • Asymptotic analysis of the role of spatial sampling for Covariance Parameter estimation of Gaussian processes
    2013
    Co-Authors: François Bachoc
    Abstract:

    Covariance Parameter estimation of Gaussian processes is analyzed in an asymptotic framework. The spatial sampling is a randomly perturbed regular grid and its deviation from the perfect regular grid is controlled by a single scalar regularity Parameter. Consistency and asymptotic normality are proved for the Maximum Likelihood and Cross Validation estimators of the Covariance Parameters. The asymptotic Covariance matrices of the Covariance Parameter estimators are deterministic functions of the regularity Parameter. By means of an exhaustive study of the asymptotic Covariance matrices, it is shown that the estimation is improved when the regular grid is strongly perturbed. Hence, an asymptotic confirmation is given to the commonly admitted fact that using groups of observation points with small spacing is beneficial to Covariance function estimation. Finally, the prediction error, using a consistent estimator of the Covariance Parameters, is analyzed in details.

Douglas W Nychka - One of the best experts on this subject based on the ideXlab platform.

  • parallel cross validation a scalable fitting method for gaussian process models
    Computational Statistics & Data Analysis, 2021
    Co-Authors: Florian Gerber, Douglas W Nychka
    Abstract:

    Gaussian process (GP) models are widely used to analyze spatially referenced data and to predict values at locations without observations. They are based on a statistical framework, which enables uncertainty quantification of the model structure and predictions. Both the evaluation of the likelihood and the prediction involve solving linear systems. Hence, the computational costs are large and limit the amount of data that can be handled. While there are many approximation strategies that lower the computational cost of GP models, they often provide sub-optimal support for the parallel computing capabilities of (high-performance) computing environments. To bridge this gap a parallelizable Parameter estimation and prediction method is presented. The key idea is to divide the spatial domain into overlapping subsets and to use cross-validation (CV) to estimate the Covariance Parameters in parallel. Although simulations show that CV is less effective for Parameter estimation than the maximum likelihood method, it is amenable to parallel computing and enables the handling of large datasets. Exploiting the screen effect for spatial prediction helps to arrive at a spatial analysis that is close to a global computation despite performing parallel computations on local regions. Simulation studies assess the accuracy of the Parameter estimates and predictions. The implementation shows good weak and strong parallel scaling properties. For illustration, an exponential Covariance model is fitted to a scientifically relevant canopy height dataset with 5 million observations. Using 512 processor cores in parallel brings the evaluation time of one Covariance Parameter configuration to 1.5 minutes.

Douglas Nychka - One of the best experts on this subject based on the ideXlab platform.

  • Fast Covariance Parameter estimation of spatial Gaussian process models using neural networks
    arXiv: Machine Learning, 2020
    Co-Authors: Florian Gerber, Douglas Nychka
    Abstract:

    Gaussian processes (GPs) are a popular model for spatially referenced data and allow descriptive statements, predictions at new locations, and simulation of new fields. Often a few Parameters are sufficient to Parameterize the Covariance function, and maximum likelihood (ML) methods can be used to estimate these Parameters from data. ML methods, however, are computationally demanding. For example, in the case of local likelihood estimation, even fitting Covariance models on modest size windows can overwhelm typical computational resources for data analysis. This limitation motivates the idea of using neural network (NN) methods to approximate ML estimates. We train NNs to take moderate size spatial fields or variograms as input and return the range and noise-to-signal Covariance Parameters. Once trained, the NNs provide estimates with a similar accuracy compared to ML estimation and at a speedup by a factor of 100 or more. Although we focus on a specific Covariance estimation problem motivated by a climate science application, this work can be easily extended to other, more complex, spatial problems and provides a proof-of-concept for this use of machine learning in computational statistics.

Arthur V Peterson - One of the best experts on this subject based on the ideXlab platform.

  • a comparison of generalized linear mixed model procedures with estimating equations for variance and Covariance Parameter estimation in longitudinal studies and group randomized trials
    Statistics in Medicine, 2001
    Co-Authors: Brent A Evans, Ziding Feng, Arthur V Peterson
    Abstract:

    Response data in longitudinal studies and group randomized trials are gathered on units that belong to clusters, within which data are usually positively correlated. Therefore, estimates and confidence intervals for intraclass correlation or variance components are helpful when designing a longitudinal study or group randomized trial. Data simulated from both study designs are used to investigate the estimation of variance and Covariance Parameters from the following procedures: for continuous outcomes, restricted maximum likelihood (REML) and estimating equations (EE); for binary outcomes, restricted pseudo-likelihood (REPL) and estimating equations (EE). We evaluate these procedures to see which provide valid and precise estimates as well as correct standard errors for the intraclass correlation coefficient or variance components. REML seems the better choice for estimating terms related to correlation for models with normal outcomes, especially in group randomized trial situations. Results for REML and EE are mixed when outcomes are continuous and non-normal. With binary outcomes neither REPL nor EE provides satisfactory estimation or inference in longitudinal study situations, while REPL is preferable for group randomized trials.