Nonzero Sample

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 21 Experts worldwide ranked by ideXlab platform

David C Hoyle - One of the best experts on this subject based on the ideXlab platform.

  • accuracy of pseudo inverse covariance learning a random matrix theory analysis
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011
    Co-Authors: David C Hoyle
    Abstract:

    For many learning problems, estimates of the inverse population covariance are required and often obtained by inverting the Sample covariance matrix. Increasingly for modern scientific data sets, the number of Sample points is less than the number of features and so the Sample covariance is not invertible. In such circumstances, the Moore-Penrose pseudo-inverse Sample covariance matrix, constructed from the eigenvectors corresponding to Nonzero Sample covariance eigenvalues, is often used as an approximation to the inverse population covariance matrix. The reconstruction error of the pseudo-inverse Sample covariance matrix in estimating the true inverse covariance can be quantified via the Frobenius norm of the difference between the two. The reconstruction error is dominated by the smallest Nonzero Sample covariance eigenvalues and diverges as the Sample size becomes comparable to the number of features. For high-dimensional data, we use random matrix theory techniques and results to study the reconstruction error for a wide class of population covariance matrices. We also show how bagging and random subspace methods can result in a reduction in the reconstruction error and can be combined to improve the accuracy of classifiers that utilize the pseudo-inverse Sample covariance matrix. We test our analysis on both simulated and benchmark data sets.

  • Accuracy of Pseudo-Inverse Covariance Learning—A Random Matrix Theory Analysis
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011
    Co-Authors: David C Hoyle
    Abstract:

    For many learning problems, estimates of the inverse population covariance are required and often obtained by inverting the Sample covariance matrix. Increasingly for modern scientific data sets, the number of Sample points is less than the number of features and so the Sample covariance is not invertible. In such circumstances, the Moore-Penrose pseudo-inverse Sample covariance matrix, constructed from the eigenvectors corresponding to Nonzero Sample covariance eigenvalues, is often used as an approximation to the inverse population covariance matrix. The reconstruction error of the pseudo-inverse Sample covariance matrix in estimating the true inverse covariance can be quantified via the Frobenius norm of the difference between the two. The reconstruction error is dominated by the smallest Nonzero Sample covariance eigenvalues and diverges as the Sample size becomes comparable to the number of features. For high-dimensional data, we use random matrix theory techniques and results to study the reconstruction error for a wide class of population covariance matrices. We also show how bagging and random subspace methods can result in a reduction in the reconstruction error and can be combined to improve the accuracy of classifiers that utilize the pseudo-inverse Sample covariance matrix. We test our analysis on both simulated and benchmark data sets.

Victor Pérez-abreu - One of the best experts on this subject based on the ideXlab platform.

  • PCA and Eigen-inference for a Spiked Covariance Model with Largest Eigenvalues of Same
    2020
    Co-Authors: Addy Bolivar-cime, Victor Pérez-abreu
    Abstract:

    In this paper we work under the setting of data with high dimension d greater than the Sample size n (HDLSS). We study asymp- totics of the rst p 2 Sample eigenvalues and their corresponding eigenvectors under a spiked covariance model for which its rst p largest population eigenvalues have the same asymptotic order of magnitude as d tends to innity and the rest are constant. We get the asymptotic joint distribution of the Nonzero Sample eigenvalues when d ! 1 and the Sample size n is xed. We then prove that the p largest Sample eigenva- lues increase jointly at the same speed as their population counterpart, in the sense that the vector of ratios of the Sample and population eigen- values converges to a multivariate distribution when d ! 1 and n is xed, and to the vector of ones when both d;n!1 and d n. We also show the subspace consistency of the corresponding Sample eigenvectors when d goes to innity and n is xed. Furthermore, using the asymp- totic joint distribution of the Sample eigenvalues we study some inference problems for the spiked covariance model and propose hypothesis tests for a particular case of this model and condence intervals for the p largest eigenvalues. A simulation is performed to assess the behavior of the proposed statistical methodologies.

  • PCA and eigen-inference for a spiked covariance model with largest eigenvalues of same asymptotic order
    Brazilian Journal of Probability and Statistics, 2014
    Co-Authors: Addy Bolivar-cime, Victor Pérez-abreu
    Abstract:

    In this paper we work under the setting of data with high dimension d greater than the Sample size n (HDLSS). We study asymptotics of the rst p 2 Sample eigenvalues and their corresponding eigenvectors under a spiked covariance model for which its rst p largest population eigenvalues have the same asymptotic order of magnitude as d tends to innity and the rest are constant or bounded as d increases. We get the asymptotic joint distribution of the Nonzero Sample eigenvalues when d! 1 and the Sample size n xed. We then prove that the p largest Sample eigenvalues increase jointly at the same speed as their population counterpart, in the sense that the vector of ratios of the Sample and population eigenvalues converges to a multivariate distribution when d!1 and n is xed, and to the vector of ones when both d;n!1 and d n. We also show the subspace consistency of the corresponding Sample eigenvectors when d goes to innity and n is xed. Furthermore, using the asymptotic joint distribution of the Sample eigenvalues we study some inference problems for the spiked covariance model and propose hypothesis tests for a particular case of this model and condence intervals for the p largest eigenvalues. A simulation is performed to assess the behavior of the proposed statistical methodologies.

Rik Pintelon - One of the best experts on this subject based on the ideXlab platform.

  • On the Elimination of Bias Averaging-Errors in Proxy Records
    Mathematical Geosciences, 2009
    Co-Authors: Veerle Beelaerts, Fjo Ridder, Nele Schmitz, Maite Bauwens, Frank Dehairs, Johan Schoukens, Rik Pintelon
    Abstract:

    Knowledge of and insight into past environmental conditions can be obtained by processing and analyzing proxies. The proxies need to be processed as precisely and accurately as possible, otherwise the conclusion of the analysis will be biased. A calibration method which reduces bias errors in the proxy measurements due to averaging is presented. Sampling with Nonzero Sample sizes causes an averaging of the true proxy signal over the volume of the Sample. The method is applied on a linear synthetic record which results in an optimal correction for frequency components ranging from the dc-frequency (DC) to one half of the Sample frequency ( f _ s /2). Next, the method is tested on non-linear synthetic data where the signal is reconstructed reasonably well. Finally, the method is applied to a real vessel density record of R. mucronata from Makongeni, Kenya, and to a real delta deuterium record of ice core EDC from dome C, Antarctica. The method discussed in this paper is a valuable tool for the calibration of proxy measurements; it can be applied as a correction for low resolution measurements and expanded to other types of Samples and proxies. Working with small Sample sizes (high resolution) amounts to working near the detection limit, where the signal-to-noise-ratio is low. This correction method provides an alternative in which low resolution measurements can be upgraded to minimize the loss of information due to larger Sample sizes.

  • On the Elimination of Bias Averaging-Errors in Proxy Records
    Mathematical Geosciences, 2008
    Co-Authors: Veerle Beelaerts, Fjo Ridder, Nele Schmitz, Maite Bauwens, Frank Dehairs, Johan Schoukens, Rik Pintelon
    Abstract:

    Knowledge of and insight into past environmental conditions can be obtained by processing and analyzing proxies. The proxies need to be processed as precisely and accurately as possible, otherwise the conclusion of the analysis will be biased. A calibration method which reduces bias errors in the proxy measurements due to averaging is presented. Sampling with Nonzero Sample sizes causes an averaging of the true proxy signal over the volume of the Sample. The method is applied on a linear synthetic record which results in an optimal correction for frequency components ranging from the dc-frequency (DC) to one half of the Sample frequency (fs/2). Next, the method is tested on non-linear synthetic data where the signal is reconstructed reasonably well. Finally, the method is applied to a real vessel density record of R.mucronata from Makongeni, Kenya, and to a real delta deuterium record of ice core EDC from dome C, Antarctica. The method discussed in this paper is a valuable tool for the calibration of proxy measurements; it can be applied as a correction for low resolution measurements and expanded to other types of Samples and proxies. Working with small Sample sizes (high resolution) amounts to working near the detection limit, where the signal-to-noise-ratio is low. This correction method provides an alternative in which low resolution measurements can be upgraded to minimize the loss of information due to larger Sample sizes.

Addy Bolivar-cime - One of the best experts on this subject based on the ideXlab platform.

  • PCA and Eigen-inference for a Spiked Covariance Model with Largest Eigenvalues of Same
    2020
    Co-Authors: Addy Bolivar-cime, Victor Pérez-abreu
    Abstract:

    In this paper we work under the setting of data with high dimension d greater than the Sample size n (HDLSS). We study asymp- totics of the rst p 2 Sample eigenvalues and their corresponding eigenvectors under a spiked covariance model for which its rst p largest population eigenvalues have the same asymptotic order of magnitude as d tends to innity and the rest are constant. We get the asymptotic joint distribution of the Nonzero Sample eigenvalues when d ! 1 and the Sample size n is xed. We then prove that the p largest Sample eigenva- lues increase jointly at the same speed as their population counterpart, in the sense that the vector of ratios of the Sample and population eigen- values converges to a multivariate distribution when d ! 1 and n is xed, and to the vector of ones when both d;n!1 and d n. We also show the subspace consistency of the corresponding Sample eigenvectors when d goes to innity and n is xed. Furthermore, using the asymp- totic joint distribution of the Sample eigenvalues we study some inference problems for the spiked covariance model and propose hypothesis tests for a particular case of this model and condence intervals for the p largest eigenvalues. A simulation is performed to assess the behavior of the proposed statistical methodologies.

  • PCA and eigen-inference for a spiked covariance model with largest eigenvalues of same asymptotic order
    Brazilian Journal of Probability and Statistics, 2014
    Co-Authors: Addy Bolivar-cime, Victor Pérez-abreu
    Abstract:

    In this paper we work under the setting of data with high dimension d greater than the Sample size n (HDLSS). We study asymptotics of the rst p 2 Sample eigenvalues and their corresponding eigenvectors under a spiked covariance model for which its rst p largest population eigenvalues have the same asymptotic order of magnitude as d tends to innity and the rest are constant or bounded as d increases. We get the asymptotic joint distribution of the Nonzero Sample eigenvalues when d! 1 and the Sample size n xed. We then prove that the p largest Sample eigenvalues increase jointly at the same speed as their population counterpart, in the sense that the vector of ratios of the Sample and population eigenvalues converges to a multivariate distribution when d!1 and n is xed, and to the vector of ones when both d;n!1 and d n. We also show the subspace consistency of the corresponding Sample eigenvectors when d goes to innity and n is xed. Furthermore, using the asymptotic joint distribution of the Sample eigenvalues we study some inference problems for the spiked covariance model and propose hypothesis tests for a particular case of this model and condence intervals for the p largest eigenvalues. A simulation is performed to assess the behavior of the proposed statistical methodologies.

Veerle Beelaerts - One of the best experts on this subject based on the ideXlab platform.

  • On the Elimination of Bias Averaging-Errors in Proxy Records
    Mathematical Geosciences, 2009
    Co-Authors: Veerle Beelaerts, Fjo Ridder, Nele Schmitz, Maite Bauwens, Frank Dehairs, Johan Schoukens, Rik Pintelon
    Abstract:

    Knowledge of and insight into past environmental conditions can be obtained by processing and analyzing proxies. The proxies need to be processed as precisely and accurately as possible, otherwise the conclusion of the analysis will be biased. A calibration method which reduces bias errors in the proxy measurements due to averaging is presented. Sampling with Nonzero Sample sizes causes an averaging of the true proxy signal over the volume of the Sample. The method is applied on a linear synthetic record which results in an optimal correction for frequency components ranging from the dc-frequency (DC) to one half of the Sample frequency ( f _ s /2). Next, the method is tested on non-linear synthetic data where the signal is reconstructed reasonably well. Finally, the method is applied to a real vessel density record of R. mucronata from Makongeni, Kenya, and to a real delta deuterium record of ice core EDC from dome C, Antarctica. The method discussed in this paper is a valuable tool for the calibration of proxy measurements; it can be applied as a correction for low resolution measurements and expanded to other types of Samples and proxies. Working with small Sample sizes (high resolution) amounts to working near the detection limit, where the signal-to-noise-ratio is low. This correction method provides an alternative in which low resolution measurements can be upgraded to minimize the loss of information due to larger Sample sizes.

  • On the Elimination of Bias Averaging-Errors in Proxy Records
    Mathematical Geosciences, 2008
    Co-Authors: Veerle Beelaerts, Fjo Ridder, Nele Schmitz, Maite Bauwens, Frank Dehairs, Johan Schoukens, Rik Pintelon
    Abstract:

    Knowledge of and insight into past environmental conditions can be obtained by processing and analyzing proxies. The proxies need to be processed as precisely and accurately as possible, otherwise the conclusion of the analysis will be biased. A calibration method which reduces bias errors in the proxy measurements due to averaging is presented. Sampling with Nonzero Sample sizes causes an averaging of the true proxy signal over the volume of the Sample. The method is applied on a linear synthetic record which results in an optimal correction for frequency components ranging from the dc-frequency (DC) to one half of the Sample frequency (fs/2). Next, the method is tested on non-linear synthetic data where the signal is reconstructed reasonably well. Finally, the method is applied to a real vessel density record of R.mucronata from Makongeni, Kenya, and to a real delta deuterium record of ice core EDC from dome C, Antarctica. The method discussed in this paper is a valuable tool for the calibration of proxy measurements; it can be applied as a correction for low resolution measurements and expanded to other types of Samples and proxies. Working with small Sample sizes (high resolution) amounts to working near the detection limit, where the signal-to-noise-ratio is low. This correction method provides an alternative in which low resolution measurements can be upgraded to minimize the loss of information due to larger Sample sizes.