Subsamplings

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform

Dimitris N Politis - One of the best experts on this subject based on the ideXlab platform.

  • fixed b subsampling and the block bootstrap improved confidence sets based on p value calibration
    Journal of The Royal Statistical Society Series B-statistical Methodology, 2013
    Co-Authors: Xiaofeng Shao, Dimitris N Politis
    Abstract:

    Summary.  Subsampling and block-based bootstrap methods have been used in a wide range of inference problems for time series. To accommodate the dependence, these resampling methods involve a bandwidth parameter, such as the subsampling window width and block size in the block-based bootstrap. In empirical work, using different bandwidth parameters could lead to different inference results, but traditional first-order asymptotic theory does not capture the choice of the bandwidth. We propose to adopt the fixed b approach, as advocated by Kiefer and Vogelsang in the heteroscedasticity–auto-correlation robust testing context, to account for the influence of the bandwidth on inference. Under the fixed b asymptotic framework, we derive the asymptotic null distribution of the p-values for subsampling and the moving block bootstrap, and further propose a calibration of the traditional small-b-based confidence intervals (regions or bands) and tests. Our treatment is fairly general as it includes both finite dimensional parameters and infinite dimensional parameters, such as the marginal distribution function. Simulation results show that the fixed b approach is more accurate than the traditional small b approach in terms of approximating the finite sample distribution, and that the calibrated confidence sets tend to have smaller coverage errors than the uncalibrated counterparts.

  • subsampling inference for the mean of heavy tailed long memory time series
    Journal of Time Series Analysis, 2012
    Co-Authors: Agnieszka Jach, Tucker Mcelroy, Dimitris N Politis
    Abstract:

    In this article, we revisit a time series model introduced by MCElroy and Politis (2007a) and generalize it in several ways to encompass a wider class of stationary, nonlinear, heavy‐tailed time series with long memory. The joint asymptotic distribution for the sample mean and sample variance under the extended model is derived; the associated convergence rates are found to depend crucially on the tail thickness and long memory parameter. A self‐normalized sample mean that concurrently captures the tail and memory behaviour, is defined. Its asymptotic distribution is approximated by subsampling without the knowledge of tail or/and memory parameters; a result of independent interest regarding subsampling consistency for certain long‐range dependent processes is provided. The subsampling‐based confidence intervals for the process mean are shown to have good empirical coverage rates in a simulation study. The influence of block size on the coverage and the performance of a data‐driven rule for block size selection are assessed. The methodology is further applied to the series of packet‐counts from ethernet traffic traces.

Agnieszka Jach - One of the best experts on this subject based on the ideXlab platform.

  • subsampling inference for the autocovariances and autocorrelations of long memory heavy tailed linear time series
    Journal of Time Series Analysis, 2012
    Co-Authors: Tucker Mcelroy, Agnieszka Jach
    Abstract:

    We provide a self-normalization for the sample autocovariances and autocorrelations of a linear, long-memory time series with innovations that have either finite fourth moment or are heavy-tailed with tail index 2 < α < 4. In the asymptotic distribution of the sample autocovariance there are three rates of convergence that depend on the interplay between the memory parameter d and α, and which consequently lead to three different limit distributions; for the sample autocorrelation the limit distribution only depends on d. We introduce a self-normalized sample autocovariance statistic, which is computable without knowledge of α or d (or their relationship), and which converges to a non-degenerate distribution. We also treat self-normalization of the autocorrelations. The sampling distributions can then be approximated non-parametrically by subsampling, as the corresponding asymptotic distribution is still parameter-dependent. The subsampling-based confidence intervals for the process autocovariances and autocorrelations are shown to have satisfactory empirical coverage rates in a simulation study. The impact of subsampling block size on the coverage is assessed. The methodology is further applied to the log-squared returns of Merck stock.

  • subsampling inference for the mean of heavy tailed long memory time series
    Journal of Time Series Analysis, 2012
    Co-Authors: Agnieszka Jach, Tucker Mcelroy, Dimitris N Politis
    Abstract:

    In this article, we revisit a time series model introduced by MCElroy and Politis (2007a) and generalize it in several ways to encompass a wider class of stationary, nonlinear, heavy‐tailed time series with long memory. The joint asymptotic distribution for the sample mean and sample variance under the extended model is derived; the associated convergence rates are found to depend crucially on the tail thickness and long memory parameter. A self‐normalized sample mean that concurrently captures the tail and memory behaviour, is defined. Its asymptotic distribution is approximated by subsampling without the knowledge of tail or/and memory parameters; a result of independent interest regarding subsampling consistency for certain long‐range dependent processes is provided. The subsampling‐based confidence intervals for the process mean are shown to have good empirical coverage rates in a simulation study. The influence of block size on the coverage and the performance of a data‐driven rule for block size selection are assessed. The methodology is further applied to the series of packet‐counts from ethernet traffic traces.

Anil K Seth - One of the best experts on this subject based on the ideXlab platform.

  • detectability of granger causality for subsampled continuous time neurophysiological processes
    Journal of Neuroscience Methods, 2017
    Co-Authors: Lionel Barnett, Anil K Seth
    Abstract:

    Abstract Background Granger causality is well established within the neurosciences for inference of directed functional connectivity from neurophysiological data. These data usually consist of time series which subsample a continuous-time biophysiological process. While it is well known that subsampling can lead to imputation of spurious causal connections where none exist, less is known about the effects of subsampling on the ability to reliably detect causal connections which do exist. New method We present a theoretical analysis of the effects of subsampling on Granger-causal inference. Neurophysiological processes typically feature signal propagation delays on multiple time scales; accordingly, we base our analysis on a distributed-lag, continuous-time stochastic model, and consider Granger causality in continuous time at finite prediction horizons. Via exact analytical solutions, we identify relationships among sampling frequency, underlying causal time scales and detectability of causalities. Results We reveal complex interactions between the time scale(s) of neural signal propagation and sampling frequency. We demonstrate that detectability decays exponentially as the sample time interval increases beyond causal delay times, identify detectability “black spots” and “sweet spots”, and show that downsampling may potentially improve detectability. We also demonstrate that the invariance of Granger causality under causal, invertible filtering fails at finite prediction horizons, with particular implications for inference of Granger causality from fMRI data. Comparison with existing methods Our analysis emphasises that sampling rates for causal analysis of neurophysiological time series should be informed by domain-specific time scales, and that state-space modelling should be preferred to purely autoregressive modelling. Conclusions On the basis of a very general model that captures the structure of neurophysiological processes, we are able to help identify confounds, and offer practical insights, for successful detection of causal connectivity from neurophysiological recordings.

  • detectability of granger causality for subsampled continuous time neurophysiological processes
    arXiv: Applications, 2016
    Co-Authors: Lionel Barnett, Anil K Seth
    Abstract:

    Granger causality is well established within the neurosciences for inference of directed functional connectivity from neurophysiological data. These data usually consist of time series which subsample a continuous-time biophysiological process. While it is well-known that subsampling can lead to imputation of spurious causal connections where none exist, here we address the equally important issue of the effects of subsampling on the ability to reliably detect causal connections which do exist. Neurophysiological processes typically feature signal propagation delays on multiple time scales; accordingly, we base our analysis on a distributed-lag, continuous-time stochastic model, and consider Granger causality in continuous time at finite prediction horizons. Via exact analytical solutions, we identify relationships among sampling frequency, underlying causal time scales and detectability of causalities. Our analysis reveals complex interactions between the time scale(s) of neural signal propagation and sampling frequency: we demonstrate that Granger causality decays exponentially as the sample time interval increases beyond causal delay times, identify detectability "black spots" and "sweet spots", and show that subsampling may sometimes improve detectability. We also demonstrate that the invariance of Granger causality under causal, invertible filtering fails at finite prediction horizons. We discuss the implications of our results for inference of Granger causality at the neural level from various neurophysiological recording modes, and emphasise that sampling rates for causal analysis of neurophysiological time series should be informed by domain-specific time scales.

Tucker Mcelroy - One of the best experts on this subject based on the ideXlab platform.

  • subsampling inference for the autocovariances and autocorrelations of long memory heavy tailed linear time series
    Journal of Time Series Analysis, 2012
    Co-Authors: Tucker Mcelroy, Agnieszka Jach
    Abstract:

    We provide a self-normalization for the sample autocovariances and autocorrelations of a linear, long-memory time series with innovations that have either finite fourth moment or are heavy-tailed with tail index 2 < α < 4. In the asymptotic distribution of the sample autocovariance there are three rates of convergence that depend on the interplay between the memory parameter d and α, and which consequently lead to three different limit distributions; for the sample autocorrelation the limit distribution only depends on d. We introduce a self-normalized sample autocovariance statistic, which is computable without knowledge of α or d (or their relationship), and which converges to a non-degenerate distribution. We also treat self-normalization of the autocorrelations. The sampling distributions can then be approximated non-parametrically by subsampling, as the corresponding asymptotic distribution is still parameter-dependent. The subsampling-based confidence intervals for the process autocovariances and autocorrelations are shown to have satisfactory empirical coverage rates in a simulation study. The impact of subsampling block size on the coverage is assessed. The methodology is further applied to the log-squared returns of Merck stock.

  • subsampling inference for the mean of heavy tailed long memory time series
    Journal of Time Series Analysis, 2012
    Co-Authors: Agnieszka Jach, Tucker Mcelroy, Dimitris N Politis
    Abstract:

    In this article, we revisit a time series model introduced by MCElroy and Politis (2007a) and generalize it in several ways to encompass a wider class of stationary, nonlinear, heavy‐tailed time series with long memory. The joint asymptotic distribution for the sample mean and sample variance under the extended model is derived; the associated convergence rates are found to depend crucially on the tail thickness and long memory parameter. A self‐normalized sample mean that concurrently captures the tail and memory behaviour, is defined. Its asymptotic distribution is approximated by subsampling without the knowledge of tail or/and memory parameters; a result of independent interest regarding subsampling consistency for certain long‐range dependent processes is provided. The subsampling‐based confidence intervals for the process mean are shown to have good empirical coverage rates in a simulation study. The influence of block size on the coverage and the performance of a data‐driven rule for block size selection are assessed. The methodology is further applied to the series of packet‐counts from ethernet traffic traces.

Rong Zhu - One of the best experts on this subject based on the ideXlab platform.

  • optimal subsampling for large sample logistic regression
    Journal of the American Statistical Association, 2018
    Co-Authors: Haiying Wang, Rong Zhu
    Abstract:

    For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least-square esti...

  • optimal subsampling for large sample logistic regression
    arXiv: Computation, 2017
    Co-Authors: Haiying Wang, Rong Zhu
    Abstract:

    For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where statistical leverage scores are often used to define subsampling probabilities. In this paper, we propose fast subsampling algorithms to efficiently approximate the maximum likelihood estimate in logistic regression. We first establish consistency and asymptotic normality of the estimator from a general subsampling algorithm, and then derive optimal subsampling probabilities that minimize the asymptotic mean squared error of the resultant estimator. An alternative minimization criterion is also proposed to further reduce the computational cost. The optimal subsampling probabilities depend on the full data estimate, so we develop a two-step algorithm to approximate the optimal subsampling procedure. This algorithm is computationally efficient and has a significant reduction in computing time compared to the full data approach. Consistency and asymptotic normality of the estimator from a two-step algorithm are also established. Synthetic and real data sets are used to evaluate the practical performance of the proposed method.