The Experts below are selected from a list of 360 Experts worldwide ranked by ideXlab platform
Nishant A Mehta - One of the best experts on this subject based on the ideXlab platform.
-
a tight excess risk bound via a unified pac bayesian Rademacher shtarkov mdl complexity
Algorithmic Learning Theory, 2019Co-Authors: Peter Grunwald, Nishant A MehtaAbstract:We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexity (also known as stochastic or PAC-Bayesian, KL(posterior∥prior) complexity. For (penalized) ERM, the new complexity reduces to (generalized) normalized maximum likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity via Rademacher complexity to L2(P) entropy, thereby generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with L∞. Together, these results recover optimal bounds for VC- and large (polynomial entropy) classes, replacing localized Rademacher complexity by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: 'easiness' (Bernstein) conditions and model complexity.
-
a tight excess risk bound via a unified pac bayesian Rademacher shtarkov mdl complexity
arXiv: Learning, 2017Co-Authors: Peter Grunwald, Nishant A MehtaAbstract:We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexity (also known as stochastic or PAC-Bayesian, $\mathrm{KL}(\text{posterior} \operatorname{\|} \text{prior})$ complexity. For (penalized) ERM, the new complexity reduces to (generalized) normalized maximum likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity via Rademacher complexity to $L_2(P)$ entropy, thereby generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with $L_\infty$. Together, these results recover optimal bounds for VC- and large (polynomial entropy) classes, replacing localized Rademacher complexity by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: 'easiness' (Bernstein) conditions and model complexity.
Andreas Maurer - One of the best experts on this subject based on the ideXlab platform.
-
uniform concentration and symmetrization for weak interactions
Conference on Learning Theory, 2019Co-Authors: Andreas Maurer, Massimiliano PontilAbstract:The method to derive uniform bounds with Gaussian and Rademacher complexities is extended to the case where the sample average is replaced by a nonlinear statistic. Tight bounds are obtained for U-statistics, smoothened L-statistics and error functionals of l2-regularized algorithms.
-
a vector contraction inequality for Rademacher complexities
Algorithmic Learning Theory, 2016Co-Authors: Andreas MaurerAbstract:The contraction inequality for Rademacher averages is extended to Lipschitz functions with vector-valued domains, and it is also shown that in the bounding expression the Rademacher variables can be replaced by arbitrary iid symmetric and sub-gaussian variables. Example applications are given for multi-category learning, K-means clustering and learning-to-learn.
-
the Rademacher complexity of linear transformation classes
Conference on Learning Theory, 2006Co-Authors: Andreas MaurerAbstract:Bounds are given for the empirical and expected Rademacher complexity of classes of linear transformations from a Hilbert space H to a finite dimensional space. The results imply generalization guarantees for graph regularization and multi-task subspace learning.
Peter Grunwald - One of the best experts on this subject based on the ideXlab platform.
-
a tight excess risk bound via a unified pac bayesian Rademacher shtarkov mdl complexity
Algorithmic Learning Theory, 2019Co-Authors: Peter Grunwald, Nishant A MehtaAbstract:We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexity (also known as stochastic or PAC-Bayesian, KL(posterior∥prior) complexity. For (penalized) ERM, the new complexity reduces to (generalized) normalized maximum likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity via Rademacher complexity to L2(P) entropy, thereby generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with L∞. Together, these results recover optimal bounds for VC- and large (polynomial entropy) classes, replacing localized Rademacher complexity by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: 'easiness' (Bernstein) conditions and model complexity.
-
a tight excess risk bound via a unified pac bayesian Rademacher shtarkov mdl complexity
arXiv: Learning, 2017Co-Authors: Peter Grunwald, Nishant A MehtaAbstract:We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexity (also known as stochastic or PAC-Bayesian, $\mathrm{KL}(\text{posterior} \operatorname{\|} \text{prior})$ complexity. For (penalized) ERM, the new complexity reduces to (generalized) normalized maximum likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity via Rademacher complexity to $L_2(P)$ entropy, thereby generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with $L_\infty$. Together, these results recover optimal bounds for VC- and large (polynomial entropy) classes, replacing localized Rademacher complexity by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: 'easiness' (Bernstein) conditions and model complexity.
Mehryar Mohri - One of the best experts on this subject based on the ideXlab platform.
-
on the Rademacher complexity of linear hypothesis sets
arXiv: Learning, 2020Co-Authors: Pranjal Awasthi, Natalie Frank, Mehryar MohriAbstract:Linear predictors form a rich class of hypotheses used in a variety of learning algorithms. We present a tight analysis of the empirical Rademacher complexity of the family of linear hypothesis classes with weight vectors bounded in $\ell_p$-norm for any $p \geq 1$. This provides a tight analysis of generalization using these hypothesis sets and helps derive sharp data-dependent learning guarantees. We give both upper and lower bounds on the Rademacher complexity of these families and show that our bounds improve upon or match existing bounds, which are known only for $1 \leq p \leq 2$.
-
learning kernels using local Rademacher complexity
Neural Information Processing Systems, 2013Co-Authors: Corinna Cortes, Marius Kloft, Mehryar MohriAbstract:We use the notion of local Rademacher complexity to design new algorithms for learning kernels. Our algorithms thereby benefit from the sharper learning bounds based on that notion which, under certain general conditions, guarantee a faster convergence rate. We devise two new learning kernel algorithms: one based on a convex optimization problem for which we give an efficient solution using existing learning kernel techniques, and another one that can be formulated as a DC-programming problem for which we describe a solution in detail. We also report the results of experiments with both algorithms in both binary and multi-class classification tasks.
-
Rademacher complexity bounds for non i i d processes
Neural Information Processing Systems, 2008Co-Authors: Mehryar Mohri, Afshin RostamizadehAbstract:This paper presents the first Rademacher complexity-based error bounds for non-i.i.d. settings, a generalization of similar existing bounds derived for the i.i.d. case. Our bounds hold in the scenario of dependent samples generated by a stationary β-mixing process, which is commonly adopted in many previous studies of non-i.i.d. settings. They benefit from the crucial advantages of Rademacher complexity over other measures of the complexity of hypothesis classes. In particular, they are data-dependent and measure the complexity of a class of hypotheses based on the training sample. The empirical Rademacher complexity can be estimated from such finite samples and lead to tighter generalization bounds. We also present the first margin bounds for kernel-based classification in this non-i.i.d. setting and briefly study their convergence.
Alexander Volberg - One of the best experts on this subject based on the ideXlab platform.
-
Rademacher type and enflo type coincide
Annals of Mathematics, 2020Co-Authors: Paata Ivanisvili, Ramon Van Handel, Alexander VolbergAbstract:A nonlinear analogue of the Rademacher type of a Banach space was introduced in classical work of Enflo. The key feature of Enflo type is that its definition uses only the metric structure of the Banach space, while the definition of Rademacher type relies on its linear structure. We prove that Rademacher type and Enflo type coincide, settling a long-standing open problem in Banach space theory. The proof is based on a novel dimension-free analogue of Pisier's inequality on the discrete cube.