Stochastic Approximation

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 66702 Experts worldwide ranked by ideXlab platform

James C. Spall - One of the best experts on this subject based on the ideXlab platform.

  • efficient implementation of second order Stochastic Approximation algorithms in high dimensional problems
    IEEE Transactions on Neural Networks, 2020
    Co-Authors: Jingyi Zhu, Long Wang, James C. Spall
    Abstract:

    Stochastic Approximation (SA) algorithms have been widely applied in minimization problems when the loss functions and/or the gradient information are only accessible through noisy evaluations. Stochastic gradient (SG) descent–a first-order algorithm and a workhorse of much machine learning–is perhaps the most famous form of SA. Among all SA algorithms, the second-order simultaneous perturbation Stochastic Approximation (2SPSA) and the second-order Stochastic gradient (2SG) are particularly efficient in handling high-dimensional problems, covering both gradient-free and gradient-based scenarios. However, due to the necessary matrix operations, the per-iteration floating-point-operations (FLOPs) cost of the standard 2SPSA/2SG is $O(p^{3})$ , where $p$ is the dimension of the underlying parameter. Note that the $O(p^{3})$ FLOPs cost is distinct from the classical SPSA-based per-iteration $O(1)$ cost in terms of the number of noisy function evaluations. In this work, we propose a technique to efficiently implement the 2SPSA/2SG algorithms via the symmetric indefinite matrix factorization and show that the FLOPs cost is reduced from $O(p^{3})$ to $O(p^{2})$ . The formal almost sure convergence and rate of convergence for the newly proposed approach are directly inherited from the standard 2SPSA/2SG. The improvement in efficiency and numerical stability is demonstrated in two numerical studies.

  • Discrete simultaneous perturbation Stochastic Approximation on loss function with noisy measurements
    Proceedings of the 2011 American Control Conference, 2011
    Co-Authors: Qi Wang, James C. Spall
    Abstract:

    Consider the Stochastic optimization of a loss function defined on p-dimensional grid of points in Euclidean space. We introduce the middle point discrete simultaneous perturbation Stochastic Approximation (DSPSA) algorithm for such discrete problems and show that convergence to the minimum is achieved. Consistent with other Stochastic Approximation methods, this method formally accommodates noisy measurements of the loss function.

  • Robust Neural Network Tracking Controller Using Simultaneous Perturbation Stochastic Approximation
    IEEE Transactions on Neural Networks, 2008
    Co-Authors: Qing Song, James C. Spall, Jie Ni
    Abstract:

    This paper considers the design of robust neural network tracking controllers for nonlinear systems. The neural network is used in the closed-loop system to estimate the nonlinear system function. We introduce the conic sector theory to establish a robust neural control system, with guaranteed boundedness for both the input/output (I/O) signals and the weights of the neural network. The neural network is trained by the simultaneous perturbation Stochastic Approximation (SPSA) method instead of the standard backpropagation (BP) algorithm. The proposed neural control system guarantees closed-loop stability of the estimation system, and a good tracking performance. The performance improvement of the proposed system over existing systems can be quantified in terms of preventing weight shifts, fast convergence, and robustness against system disturbance.

  • adaptive Stochastic Approximation by the simultaneous perturbation method
    IEEE Transactions on Automatic Control, 2000
    Co-Authors: James C. Spall
    Abstract:

    Stochastic Approximation (SA) has long been applied for problems of minimizing loss functions or root finding with noisy input information. As with all Stochastic search algorithms, there are adjustable algorithm coefficients that must be specified, and that can have a profound effect on algorithm performance. It is known that choosing these coefficients according to an SA analog of the deterministic Newton-Raphson algorithm provides an optimal or near-optimal form of the algorithm. However, directly determining the required Hessian matrix (or Jacobian matrix for root finding) to achieve this algorithm form has often been difficult or impossible in practice. The paper presents a general adaptive SA algorithm that is based on a simple method for estimating the Hessian matrix, while concurrently estimating the primary parameters of interest. The approach applies in both the gradient-free optimization (Kiefer-Wolfowitz) and root-finding/Stochastic gradient-based (Robbins-Monro) settings, and is based on the "simultaneous perturbation (SP)" idea introduced previously. The algorithm requires only a small number of loss function or gradient measurements per iteration-independent of the problem dimension-to adaptively estimate the Hessian and parameters of primary interest. Aside from introducing the adaptive SP approach, the paper presents practical implementation guidance, asymptotic theory, and a nontrivial numerical evaluation. Also included is a discussion and numerical analysis comparing the adaptive SP approach with the iterate-averaging approach to accelerated SA.

  • adaptive Stochastic Approximation by the simultaneous perturbation method
    Conference on Decision and Control, 1998
    Co-Authors: James C. Spall
    Abstract:

    Stochastic Approximation (SA) has long been applied for problems of minimizing loss functions or root-finding with noisy input information. As with all Stochastic search algorithms, there are adjustable algorithm coefficients that must be specified and that can have a profound effect on algorithm performance. It is known that picking these coefficients according to an SA analogue of the deterministic Newton-Raphson algorithm provides an optimal or near-optimal form of the algorithm. This paper presents a general adaptive SA algorithm that is based on an easy method for estimating the Hessian matrix at each iteration while concurrently estimating the primary parameters of interest. The approach applies in both the gradient-free optimization (Kiefer-Wolfowitz) and root-finding/Stochastic gradient-based (Robbins-Monro) settings and is based on the "simultaneous perturbation" idea introduced previously.

Francis Bach - One of the best experts on this subject based on the ideXlab platform.

  • Nonparametric Stochastic Approximation with large step-sizes
    Annals of Statistics, 2016
    Co-Authors: Aymeric Dieuleveut, Francis Bach
    Abstract:

    We consider the random-design least-squares regression problem within the reproducing kernel Hilbert space (RKHS) framework. Given a stream of independent and identically distributed input/output data, we aim to learn a regression function within an RKHS $\mathcal{H}$, even if the optimal predictor (i.e., the conditional expectation) is not in $\mathcal{H}$. In a Stochastic Approximation framework where the estimator is updated after each observation, we show that the averaged unregularized least-mean-square algorithm (a form of Stochastic gradient), given a sufficient large step-size, attains optimal rates of convergence for a variety of regimes for the smoothnesses of the optimal prediction function and the functions in $\mathcal{H}$.

  • Non-parametric Stochastic Approximation with Large Step sizes
    2015
    Co-Authors: Aymeric Dieuleveut, Francis Bach
    Abstract:

    We consider the random-design least-squares regression problem within the reproducing kernel Hilbert space (RKHS) framework. Given a stream of independent and identically distributed input/output data, we aim to learn a regression function within an RKHS $\mathcal{H}$, even if the optimal predictor (i.e., the conditional expectation) is not in $\mathcal{H}$. In a Stochastic Approximation framework where the estimator is updated after each observation, we show that the averaged unregularized least-mean-square algorithm (a form of Stochastic gradient), given a sufficient large step-size, attains optimal rates of convergence for a variety of regimes for the smoothnesses of the optimal prediction function and the functions in $\mathcal{H}$.

  • non asymptotic analysis of Stochastic Approximation algorithms for machine learning
    Neural Information Processing Systems, 2011
    Co-Authors: Eric Moulines, Francis Bach
    Abstract:

    We consider the minimization of a convex objective function defined on a Hilbert space, which is only available through unbiased estimates of its gradients. This problem includes standard machine learning algorithms such as kernel logistic regression and least-squares regression, and is commonly referred to as a Stochastic Approximation problem in the operations research community. We provide a non-asymptotic analysis of the convergence of two well-known algorithms, Stochastic gradient descent (a.k.a. Robbins-Monro algorithm) as well as a simple modification where iterates are averaged (a.k.a. Polyak-Ruppert averaging). Our analysis suggests that a learning rate proportional to the inverse of the number of iterations, while leading to the optimal convergence rate in the strongly convex case, is not robust to the lack of strong convexity or the setting of the proportionality constant. This situation is remedied when using slower decays together with averaging, robustly leading to the optimal rate of convergence. We illustrate our theoretical results with simulations on synthetic and standard datasets.

Faming Liang - One of the best experts on this subject based on the ideXlab platform.

  • simulated Stochastic Approximation annealing for global optimization with a square root cooling schedule
    Journal of the American Statistical Association, 2014
    Co-Authors: Faming Liang, Yichen Cheng, Guang Lin
    Abstract:

    Simulated annealing has been widely used in the solution of optimization problems. As known by many researchers, the global optima cannot be guaranteed to be located by simulated annealing unless a logarithmic cooling schedule is used. However, the logarithmic cooling schedule is so slow that no one can afford to use this much CPU time. This article proposes a new Stochastic optimization algorithm, the so-called simulated Stochastic Approximation annealing algorithm, which is a combination of simulated annealing and the Stochastic Approximation Monte Carlo algorithm. Under the framework of Stochastic Approximation, it is shown that the new algorithm can work with a cooling schedule in which the temperature can decrease much faster than in the logarithmic cooling schedule, for example, a square-root cooling schedule, while guaranteeing the global optima to be reached when the temperature tends to zero. The new algorithm has been tested on a few benchmark optimization problems, including feed-forward neural network training and protein-folding. The numerical results indicate that the new algorithm can significantly outperform simulated annealing and other competitors. Supplementary materials for this article are available online.

  • simulated Stochastic Approximation annealing for global optimization with a square root cooling schedule
    Journal of the American Statistical Association, 2014
    Co-Authors: Faming Liang, Yichen Cheng, Guang Lin
    Abstract:

    Simulated annealing has been widely used in the solution of optimization problems. As known by many researchers, the global optima cannot be guaranteed to be located by simulated annealing unless a logarithmic cooling schedule is used. However, the logarithmic cooling schedule is so slow that no one can afford to use this much CPU time. This article proposes a new Stochastic optimization algorithm, the so-called simulated Stochastic Approximation annealing algorithm, which is a combination of simulated annealing and the Stochastic Approximation Monte Carlo algorithm. Under the framework of Stochastic Approximation, it is shown that the new algorithm can work with a cooling schedule in which the temperature can decrease much faster than in the logarithmic cooling schedule, for example, a square-root cooling schedule, while guaranteeing the global optima to be reached when the temperature tends to zero. The new algorithm has been tested on a few benchmark optimization problems, including feed-forward neural...

  • a resampling based Stochastic Approximation method for analysis of large geostatistical data
    Journal of the American Statistical Association, 2013
    Co-Authors: Faming Liang, Yichen Cheng, Qifan Song, Jincheol Park, Ping Yang
    Abstract:

    The Gaussian geostatistical model has been widely used in modeling of spatial data. However, it is challenging to computationally implement this method because it requires the inversion of a large covariance matrix, particularly when there is a large number of observations. This article proposes a resampling-based Stochastic Approximation method to address this challenge. At each iteration of the proposed method, a small subsample is drawn from the full dataset, and then the current estimate of the parameters is updated accordingly under the framework of Stochastic Approximation. Since the proposed method makes use of only a small proportion of the data at each iteration, it avoids inverting large covariance matrices and thus is scalable to large datasets. The proposed method also leads to a general parameter estimation approach, maximum mean log-likelihood estimation, which includes the popular maximum (log)-likelihood estimation (MLE) approach as a special case and is expected to play an important role ...

  • On the use of Stochastic Approximation Monte Carlo for Monte Carlo integration
    Statistics & Probability Letters, 2009
    Co-Authors: Faming Liang
    Abstract:

    The Stochastic Approximation Monte Carlo (SAMC) algorithm has recently been proposed as a dynamic optimization algorithm in the literature. In this paper, we show in theory that the samples generated by SAMC can be used for Monte Carlo integration via a dynamically weighted estimator by calling some results from the literature of nonhomogeneous Markov chains. Our numerical results indicate that SAMC can yield significant savings over conventional Monte Carlo algorithms, such as the Metropolis-Hastings algorithm, for the problems for which the energy landscape is rugged.

Uday V Shanbhag - One of the best experts on this subject based on the ideXlab platform.

  • a regularized smoothing Stochastic Approximation rssa algorithm for Stochastic variational inequality problems
    Winter Simulation Conference, 2013
    Co-Authors: Farzad Yousefian, A Nedic, Uday V Shanbhag
    Abstract:

    We consider a Stochastic variational inequality (SVI) problem with a continuous and monotone mapping over a compact and convex set. Traditionally, Stochastic Approximation (SA) schemes for SVIs have relied on strong monotonicity and Lipschitzian properties of the underlying map. We present a regularized smoothed SA (RSSA) scheme where in the stepsize, smoothing, and regularization parameters are diminishing sequences. Under suitable assumptions on the sequences, we show that the algorithm generates iterates that converge to a solution in an almost-sure sense. Additionally, we provide rate estimates that relate iterates to their counterparts derived from the Tikhonov trajectory associated with a deterministic problem.

  • regularized iterative Stochastic Approximation methods for Stochastic variational inequality problems
    IEEE Transactions on Automatic Control, 2013
    Co-Authors: Jayash Koshal, A Nedic, Uday V Shanbhag
    Abstract:

    We consider a Cartesian Stochastic variational inequality problem with a monotone map. Monotone Stochastic variational inequalities arise naturally, for instance, as the equilibrium conditions of monotone Stochastic Nash games over continuous strategy sets or multiuser Stochastic optimization problems. We introduce two classes of Stochastic Approximation methods, each of which requires exactly one projection step at every iteration, and provide convergence analysis for each of them. Of these, the first is a Stochastic iterative Tikhonov regularization method which necessitates the update of the regularization parameter after every iteration. The second method is a Stochastic iterative proximal-point method, where the centering term is updated after every iteration. The Cartesian structure lends itself to constructing distributed multi-agent extensions and conditions are provided for recovering global convergence in limited coordination variants where agents are allowed to choose their steplength sequences, regularization and centering parameters independently, while meeting a suitable coordination requirement. We apply the proposed class of techniques and their limited coordination versions to a Stochastic networked rate allocation problem.

Krishanu Maulik - One of the best experts on this subject based on the ideXlab platform.

  • Stochastic Approximation with random step sizes and urn models with random replacement matrices having finite mean
    Annals of Applied Probability, 2019
    Co-Authors: Ujan Gangopadhyay, Krishanu Maulik
    Abstract:

    The Stochastic Approximation algorithm is a useful technique which has been exploited successfully in probability theory and statistics for a long time. The step sizes used in Stochastic Approximation are generally taken to be deterministic and same is true for the drift. However, the specific application of urn models with random replacement matrices motivates us to consider Stochastic Approximation in a setup where both the step sizes and the drift are random, but the sequence is uniformly bounded. The problem becomes interesting when the negligibility conditions on the errors hold only in probability. We first prove a result on Stochastic Approximation in this setup, which is new in the literature. Then, as an application, we study urn models with random replacement matrices. In the urn model, the replacement matrices need neither be independent, nor identically distributed. We assume that the replacement matrices are only independent of the color drawn in the same round conditioned on the entire past. We relax the usual second moment assumption on the replacement matrices in the literature and require only first moment to be finite. We require the conditional expectation of the replacement matrix given the past to be close to an irreducible matrix, in an appropriate sense. We do not require any of the matrices to be balanced or nonrandom. We prove convergence of the proportion vector, the composition vector and the count vector in $L^{1}$, and hence in probability. It is to be noted that the related differential equation is of Lotka–Volterra type and can be analyzed directly.

  • Stochastic Approximation with random step sizes and urn models with random replacement matrices having finite mean
    arXiv: Probability, 2017
    Co-Authors: Ujan Gangopadhyay, Krishanu Maulik
    Abstract:

    Stochastic Approximation algorithm is a useful technique which has been exploited successfully in probability theory and statistics for a long time. The step sizes used in Stochastic Approximation are generally taken to be deterministic and same is true for the drift. However, the specific application of urn models with random replacement matrices motivates us to consider Stochastic Approximation in a setup where both the step sizes and the drift are random, but the sequence is uniformly bounded. The problem becomes interesting when the negligibility conditions on the errors hold only in probability. We first prove a result on Stochastic Approximation in this setup, which is new in the literature. Then, as an application, we study urn models with random replacement matrices. In the urn model, the replacement matrices need neither be independent, nor identically distributed. We assume that the replacement matrices are only independent of the color drawn in the same round conditioned on the entire past. We relax the usual second moment assumption on the replacement matrices in the literature and require only first moment to be finite. We require the conditional expectation of the replacement matrix given the past to be close to an irreducible matrix, in an appropriate sense. We do not require any of the matrices to be balanced or nonrandom. We prove convergence of the proportion vector, the composition vector and the count vector in $L^1$, and hence in probability. It is to be noted that the related differential equation is of Lotka-Volterra type and can be analyzed directly.