Bayes Error

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 12498 Experts worldwide ranked by ideXlab platform

Peter Sollich - One of the best experts on this subject based on the ideXlab platform.

  • learning curves for multi task gaussian process regression
    Neural Information Processing Systems, 2012
    Co-Authors: Peter Sollich, Simon R F Ashton
    Abstract:

    We study the average case performance of multi-task Gaussian process (GP) regression as captured in the learning curve, i.e. the average Bayes Error for a chosen task versus the total number of examples n for all tasks. For GP covariances that are the product of an input-dependent covariance function and a free-form intertask covariance matrix, we show that accurate approximations for the learning curve can be obtained for an arbitrary number of tasks T. We use these to study the asymptotic learning behaviour for large n. Surprisingly, multi-task learning can be asymptotically essentially useless, in the sense that examples from other tasks help only when the degree of inter-task correlation, ρ, is near its maximal value ρ = 1. This effect is most extreme for learning of smooth target functions as described by e.g. squared exponential kernels. We also demonstrate that when learning many tasks, the learning curves separate into an initial phase, where the Bayes Error on each task is reduced down to a plateau value by "collective learning" even though most tasks have not seen examples, and a final decay that occurs once the number of examples is proportional to the number of tasks.

  • exact learning curves for gaussian process regression on large random graphs
    Neural Information Processing Systems, 2010
    Co-Authors: Matthew Urry, Peter Sollich
    Abstract:

    We study learning curves for Gaussian process regression which characterise performance in terms of the Bayes Error averaged over datasets of a given size. Whilst learning curves are in general very difficult to calculate we show that for discrete input domains, where similarity between input points is characterised in terms of a graph, accurate predictions can be obtained. These should in fact become exact for large graphs drawn from a broad range of random graph ensembles with arbitrary degree distributions where each input (node) is connected only to a finite number of others. Our approach is based on translating the appropriate belief propagation equations to the graph ensemble. We demonstrate the accuracy of the predictions for Poisson (Erdos-Renyi) and regular random graphs, and discuss when and why previous approximations of the learning curve fail.

  • kernels and learning curves for gaussian process regression on random graphs
    Neural Information Processing Systems, 2009
    Co-Authors: Peter Sollich, Matthew Urry, Camille Coti
    Abstract:

    We investigate how well Gaussian process regression can learn functions defined on graphs, using large regular random graphs as a paradigmatic example. Random-walk based kernels are shown to have some non-trivial properties: within the standard approximation of a locally tree-like graph structure, the kernel does not become constant, i.e. neighbouring function values do not become fully correlated, when the lengthscale σ of the kernel is made large. Instead the kernel attains a non-trivial limiting form, which we calculate. The fully correlated limit is reached only once loops become relevant, and we estimate where the crossover to this regime occurs. Our main subject are learning curves of Bayes Error versus training set size. We show that these are qualitatively well predicted by a simple approximation using only the spectrum of a large tree as input, and generically scale with n/V, the number of training examples per vertex. We also explore how this behaviour changes for kernel lengthscales that are large enough for loops to become important.

Mukund Padmanabhan - One of the best experts on this subject based on the ideXlab platform.

  • Minimum Bayes Error feature selection for continuous speech recognition
    2020
    Co-Authors: George Saon, Mukund Padmanabhan
    Abstract:

    Abstract We consider the problem of designing a linear transformation () E lR Px n, of rank p ~ n, which projects the features of a classifier x E lR n onto y = ()x E lR P such as to achieve minimum Bayes Error (or probability of misclassification). Two avenues will be explored: the first is to maximize the ()-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of (). While both approaches yield similar performance in practice, they outperform standard LDA features and show a 10% relative improvement in the word Error rate over state-of-the-art cepstral features on a large vocabulary telephony speech recognition task

  • minimum Bayes Error feature selection for continuous speech recognition
    Neural Information Processing Systems, 2000
    Co-Authors: George Saon, Mukund Padmanabhan
    Abstract:

    We consider the problem of designing a linear transformation θ ∈ Rp × n, of rank p ≤ n, which projects the features of a classifier x ∈ Rn onto y = θx ∈ Rp such as to achieve minimum Bayes Error (or probability of misclassification). Two avenues will be explored: the first is to maximize the θ-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of θ. While both approaches yield similar performance in practice, they outperform standard LDA features and show a 10% relative improvement in the word Error rate over state-of-the-art cepstral features on a large vocabulary telephony speech recognition task.

Zhixin Zhou - One of the best experts on this subject based on the ideXlab platform.

  • rate optimal chernoff bound and application to community detection in the stochastic block models
    Electronic Journal of Statistics, 2020
    Co-Authors: Zhixin Zhou
    Abstract:

    The Chernoff coefficient is known to be an upper bound of Bayes Error probability in classification problem. In this paper, we will develop a rate optimal Chernoff bound on the Bayes Error probability. The new bound is not only an upper bound but also a lower bound of Bayes Error probability up to a constant factor. Moreover, we will apply this result to community detection in the stochastic block models. As a clustering problem, the optimal misclassification rate of community detection problem can be characterized by our rate optimal Chernoff bound. This can be formalized by deriving a minimax Error rate over certain parameter space of stochastic block models, then achieving such an Error rate by a feasible algorithm employing multiple steps of EM type updates.

  • non asymptotic chernoff lower bound and its application to community detection in stochastic block model
    arXiv: Statistics Theory, 2018
    Co-Authors: Zhixin Zhou
    Abstract:

    Chernoff coefficient is an upper bound of Bayes Error probability in classification problem. In this paper, we will develop sharp Chernoff type bound on Bayes Error probability. The new bound is not only an upper bound but also a lower bound of Bayes Error probability up to a constant in a non-asymptotic setting. Moreover, we will apply this result to community detection in stochastic block model. As a clustering problem, the optimal Error rate of community detection can be characterized by our Chernoff type bound. This can be formalized by deriving a minimax Error rate over certain class of parameter space, then achieving such Error rate by a feasible algorithm employ multiple steps of EM type updates.

Frank Nielsen - One of the best experts on this subject based on the ideXlab platform.

  • Computational Information Geometry for Binary Classification of High-Dimensional Random Tensors
    Entropy, 2018
    Co-Authors: Gia-thuy Pham, Rémy Boyer, Frank Nielsen
    Abstract:

    Evaluating the performance of Bayesian classification in a high-dimensional random tensor is a fundamental problem, usually difficult and under-studied. In this work, we consider two Signal to Noise Ratio (SNR)-based binary classification problems of interest. Under the alternative hypothesis, i.e., for a non-zero SNR, the observed signals are either a noisy rank-R tensor admitting a Q-order Canonical Polyadic Decomposition (CPD) with large factors of size N q × R, i.e., for 1 ≤ q ≤ Q, where R, N q → ∞ with R 1/q /N q converge towards a finite constant or a noisy tensor admitting TucKer Decomposition (TKD) of multilinear (M 1 ,. .. , M Q)-rank with large factors of size N q × M q , i.e., for 1 ≤ q ≤ Q, where N q , M q → ∞ with M q /N q converge towards a finite constant. The classification of the random entries (coefficients) of the core tensor in the CPD/TKD is hard to study since the exact derivation of the minimal Bayes' Error probability is mathematically intractable. To circumvent this difficulty, the Chernoff Upper Bound (CUB) for larger SNR and the Fisher information at low SNR are derived and studied, based on information geometry theory. The tightest CUB is reached for the value minimizing the Error exponent, denoted by s. In general, due to the asymmetry of the s-divergence, the Bhattacharyya Upper Bound (BUB) (that is, the Chernoff Information calculated at s = 1/2) cannot solve this problem effectively. As a consequence, we rely on a costly numerical optimization strategy to find s. However, thanks to powerful random matrix theory tools, a simple analytical expression of s is provided with respect to the Signal to Noise Ratio (SNR) in the two schemes considered. This work shows that the BUB is the tightest bound at low SNRs. However, for higher SNRs, the latest property is no longer true.

  • generalized bhattacharyya and chernoff upper bounds on Bayes Error using quasi arithmetic means
    Pattern Recognition Letters, 2014
    Co-Authors: Frank Nielsen
    Abstract:

    Bayesian classification labels observations based on given prior information, namely class-a priori and class-conditional probabilities. Bayes’ risk is the minimum expected classification cost that is achieved by the Bayes’ test, the optimal decision rule. When no cost incurs for correct classification and unit cost is charged for misclassification, Bayes’ test reduces to the maximum a posteriori decision rule, and Bayes risk simplifies to BayesError, the probability of Error. Since calculating this probability of Error is often intractable, several techniques have been devised to bound it with closed-form formula, introducing thereby measures of similarity and divergence between distributions like the Bhattacharyya coefficient and its associated Bhattacharyya distance. The Bhattacharyya upper bound can further be tightened using the Chernoff information that relies on the notion of best Error exponent. In this paper, we first express Bayes’ risk using the total variation distance on scaled distributions. We then elucidate and extend the Bhattacharyya and the Chernoff upper bound mechanisms using generalized weighted means. We provide as a byproduct novel notions of statistical divergences and affinity coefficients. We illustrate our technique by deriving new upper bounds for the univariate Cauchy and the multivariate t-distributions, and show experimentally that those bounds are not too distant to the computationally intractable BayesError.

  • bhattacharyya clustering with applications to mixture simplifications
    International Conference on Pattern Recognition, 2010
    Co-Authors: Frank Nielsen, Sylvain Boltz, Olivier Schwander
    Abstract:

    Bhattacharrya distance (BD) is a widely used distance in statistics to compare probability density functions (PDFs). It has shown strong statistical properties (in terms of Bayes Error) and it relates to Fisher information. It has also practical advantages, since it strongly relates on measuring the overlap of the supports of the PDFs. Unfortunately, even with common parametric models on PDFs, few closed-form formulas are known. Moreover, the BD centroid estimation was limited to univariate gaussian PDFs in the literature and no convergence guarantees were provided. In this paper, we propose a closed-form formula for BD on a general class of parametric distributions named exponential families. We show that the BD is a Burbea-Rao divergence for the log normalizer of the exponential family. We propose an efficient iterative scheme to compute a BD centroid on exponential families. Finally, these results allow us to define a Bhattacharrya hierarchical clustering algorithms (BHC). It can be viewed as a generalization of k-means on BD. Results on image segmentation shows the stability of the method.

George Saon - One of the best experts on this subject based on the ideXlab platform.

  • Minimum Bayes Error feature selection for continuous speech recognition
    2020
    Co-Authors: George Saon, Mukund Padmanabhan
    Abstract:

    Abstract We consider the problem of designing a linear transformation () E lR Px n, of rank p ~ n, which projects the features of a classifier x E lR n onto y = ()x E lR P such as to achieve minimum Bayes Error (or probability of misclassification). Two avenues will be explored: the first is to maximize the ()-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of (). While both approaches yield similar performance in practice, they outperform standard LDA features and show a 10% relative improvement in the word Error rate over state-of-the-art cepstral features on a large vocabulary telephony speech recognition task

  • minimum Bayes Error feature selection for continuous speech recognition
    Neural Information Processing Systems, 2000
    Co-Authors: George Saon, Mukund Padmanabhan
    Abstract:

    We consider the problem of designing a linear transformation θ ∈ Rp × n, of rank p ≤ n, which projects the features of a classifier x ∈ Rn onto y = θx ∈ Rp such as to achieve minimum Bayes Error (or probability of misclassification). Two avenues will be explored: the first is to maximize the θ-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of θ. While both approaches yield similar performance in practice, they outperform standard LDA features and show a 10% relative improvement in the word Error rate over state-of-the-art cepstral features on a large vocabulary telephony speech recognition task.