Stochastic Approximation

The Experts below are selected from a list of 66702 Experts worldwide ranked by ideXlab platform

James C. Spall - One of the best experts on this subject based on the ideXlab platform.

efficient implementation of second order Stochastic Approximation algorithms in high dimensional problems

IEEE Transactions on Neural Networks, 2020

Co-Authors: Jingyi Zhu, Long Wang, James C. Spall

Abstract:

Stochastic Approximation (SA) algorithms have been widely applied in minimization problems when the loss functions and/or the gradient information are only accessible through noisy evaluations. Stochastic gradient (SG) descent–a first-order algorithm and a workhorse of much machine learning–is perhaps the most famous form of SA. Among all SA algorithms, the second-order simultaneous perturbation Stochastic Approximation (2SPSA) and the second-order Stochastic gradient (2SG) are particularly efficient in handling high-dimensional problems, covering both gradient-free and gradient-based scenarios. However, due to the necessary matrix operations, the per-iteration floating-point-operations (FLOPs) cost of the standard 2SPSA/2SG is $O(p^{3})$ , where $p$ is the dimension of the underlying parameter. Note that the $O(p^{3})$ FLOPs cost is distinct from the classical SPSA-based per-iteration $O(1)$ cost in terms of the number of noisy function evaluations. In this work, we propose a technique to efficiently implement the 2SPSA/2SG algorithms via the symmetric indefinite matrix factorization and show that the FLOPs cost is reduced from $O(p^{3})$ to $O(p^{2})$ . The formal almost sure convergence and rate of convergence for the newly proposed approach are directly inherited from the standard 2SPSA/2SG. The improvement in efficiency and numerical stability is demonstrated in two numerical studies.

15 days free trial to Access Article
Discrete simultaneous perturbation Stochastic Approximation on loss function with noisy measurements

Proceedings of the 2011 American Control Conference, 2011

Co-Authors: Qi Wang, James C. Spall

Abstract:

Consider the Stochastic optimization of a loss function defined on p-dimensional grid of points in Euclidean space. We introduce the middle point discrete simultaneous perturbation Stochastic Approximation (DSPSA) algorithm for such discrete problems and show that convergence to the minimum is achieved. Consistent with other Stochastic Approximation methods, this method formally accommodates noisy measurements of the loss function.

15 days free trial to Access Article
Robust Neural Network Tracking Controller Using Simultaneous Perturbation Stochastic Approximation

IEEE Transactions on Neural Networks, 2008

Co-Authors: Qing Song, James C. Spall, Jie Ni

Abstract:

This paper considers the design of robust neural network tracking controllers for nonlinear systems. The neural network is used in the closed-loop system to estimate the nonlinear system function. We introduce the conic sector theory to establish a robust neural control system, with guaranteed boundedness for both the input/output (I/O) signals and the weights of the neural network. The neural network is trained by the simultaneous perturbation Stochastic Approximation (SPSA) method instead of the standard backpropagation (BP) algorithm. The proposed neural control system guarantees closed-loop stability of the estimation system, and a good tracking performance. The performance improvement of the proposed system over existing systems can be quantified in terms of preventing weight shifts, fast convergence, and robustness against system disturbance.

15 days free trial to Access Article
adaptive Stochastic Approximation by the simultaneous perturbation method

IEEE Transactions on Automatic Control, 2000

Co-Authors: James C. Spall

Abstract:

Stochastic Approximation (SA) has long been applied for problems of minimizing loss functions or root finding with noisy input information. As with all Stochastic search algorithms, there are adjustable algorithm coefficients that must be specified, and that can have a profound effect on algorithm performance. It is known that choosing these coefficients according to an SA analog of the deterministic Newton-Raphson algorithm provides an optimal or near-optimal form of the algorithm. However, directly determining the required Hessian matrix (or Jacobian matrix for root finding) to achieve this algorithm form has often been difficult or impossible in practice. The paper presents a general adaptive SA algorithm that is based on a simple method for estimating the Hessian matrix, while concurrently estimating the primary parameters of interest. The approach applies in both the gradient-free optimization (Kiefer-Wolfowitz) and root-finding/Stochastic gradient-based (Robbins-Monro) settings, and is based on the "simultaneous perturbation (SP)" idea introduced previously. The algorithm requires only a small number of loss function or gradient measurements per iteration-independent of the problem dimension-to adaptively estimate the Hessian and parameters of primary interest. Aside from introducing the adaptive SP approach, the paper presents practical implementation guidance, asymptotic theory, and a nontrivial numerical evaluation. Also included is a discussion and numerical analysis comparing the adaptive SP approach with the iterate-averaging approach to accelerated SA.

15 days free trial to Access Article
adaptive Stochastic Approximation by the simultaneous perturbation method

Conference on Decision and Control, 1998

Co-Authors: James C. Spall

Abstract:

Stochastic Approximation (SA) has long been applied for problems of minimizing loss functions or root-finding with noisy input information. As with all Stochastic search algorithms, there are adjustable algorithm coefficients that must be specified and that can have a profound effect on algorithm performance. It is known that picking these coefficients according to an SA analogue of the deterministic Newton-Raphson algorithm provides an optimal or near-optimal form of the algorithm. This paper presents a general adaptive SA algorithm that is based on an easy method for estimating the Hessian matrix at each iteration while concurrently estimating the primary parameters of interest. The approach applies in both the gradient-free optimization (Kiefer-Wolfowitz) and root-finding/Stochastic gradient-based (Robbins-Monro) settings and is based on the "simultaneous perturbation" idea introduced previously.

15 days free trial to Access Article

Francis Bach - One of the best experts on this subject based on the ideXlab platform.

Nonparametric Stochastic Approximation with large step-sizes

Annals of Statistics, 2016

Co-Authors: Aymeric Dieuleveut, Francis Bach

Abstract:

We consider the random-design least-squares regression problem within the reproducing kernel Hilbert space (RKHS) framework. Given a stream of independent and identically distributed input/output data, we aim to learn a regression function within an RKHS $\mathcal{H}$, even if the optimal predictor (i.e., the conditional expectation) is not in $\mathcal{H}$. In a Stochastic Approximation framework where the estimator is updated after each observation, we show that the averaged unregularized least-mean-square algorithm (a form of Stochastic gradient), given a sufficient large step-size, attains optimal rates of convergence for a variety of regimes for the smoothnesses of the optimal prediction function and the functions in $\mathcal{H}$.

15 days free trial to Access Article
Non-parametric Stochastic Approximation with Large Step sizes

2015

Co-Authors: Aymeric Dieuleveut, Francis Bach

Abstract:

We consider the random-design least-squares regression problem within the reproducing kernel Hilbert space (RKHS) framework. Given a stream of independent and identically distributed input/output data, we aim to learn a regression function within an RKHS $\mathcal{H}$, even if the optimal predictor (i.e., the conditional expectation) is not in $\mathcal{H}$. In a Stochastic Approximation framework where the estimator is updated after each observation, we show that the averaged unregularized least-mean-square algorithm (a form of Stochastic gradient), given a sufficient large step-size, attains optimal rates of convergence for a variety of regimes for the smoothnesses of the optimal prediction function and the functions in $\mathcal{H}$.

15 days free trial to Access Article
non asymptotic analysis of Stochastic Approximation algorithms for machine learning

Neural Information Processing Systems, 2011

Co-Authors: Eric Moulines, Francis Bach

Abstract:

We consider the minimization of a convex objective function defined on a Hilbert space, which is only available through unbiased estimates of its gradients. This problem includes standard machine learning algorithms such as kernel logistic regression and least-squares regression, and is commonly referred to as a Stochastic Approximation problem in the operations research community. We provide a non-asymptotic analysis of the convergence of two well-known algorithms, Stochastic gradient descent (a.k.a. Robbins-Monro algorithm) as well as a simple modification where iterates are averaged (a.k.a. Polyak-Ruppert averaging). Our analysis suggests that a learning rate proportional to the inverse of the number of iterations, while leading to the optimal convergence rate in the strongly convex case, is not robust to the lack of strong convexity or the setting of the proportionality constant. This situation is remedied when using slower decays together with averaging, robustly leading to the optimal rate of convergence. We illustrate our theoretical results with simulations on synthetic and standard datasets.

15 days free trial to Access Article

Faming Liang - One of the best experts on this subject based on the ideXlab platform.

simulated Stochastic Approximation annealing for global optimization with a square root cooling schedule

Journal of the American Statistical Association, 2014

Co-Authors: Faming Liang, Yichen Cheng, Guang Lin

Abstract:

Simulated annealing has been widely used in the solution of optimization problems. As known by many researchers, the global optima cannot be guaranteed to be located by simulated annealing unless a logarithmic cooling schedule is used. However, the logarithmic cooling schedule is so slow that no one can afford to use this much CPU time. This article proposes a new Stochastic optimization algorithm, the so-called simulated Stochastic Approximation annealing algorithm, which is a combination of simulated annealing and the Stochastic Approximation Monte Carlo algorithm. Under the framework of Stochastic Approximation, it is shown that the new algorithm can work with a cooling schedule in which the temperature can decrease much faster than in the logarithmic cooling schedule, for example, a square-root cooling schedule, while guaranteeing the global optima to be reached when the temperature tends to zero. The new algorithm has been tested on a few benchmark optimization problems, including feed-forward neural network training and protein-folding. The numerical results indicate that the new algorithm can significantly outperform simulated annealing and other competitors. Supplementary materials for this article are available online.

15 days free trial to Access Article
simulated Stochastic Approximation annealing for global optimization with a square root cooling schedule

Journal of the American Statistical Association, 2014

Co-Authors: Faming Liang, Yichen Cheng, Guang Lin

Abstract:

Simulated annealing has been widely used in the solution of optimization problems. As known by many researchers, the global optima cannot be guaranteed to be located by simulated annealing unless a logarithmic cooling schedule is used. However, the logarithmic cooling schedule is so slow that no one can afford to use this much CPU time. This article proposes a new Stochastic optimization algorithm, the so-called simulated Stochastic Approximation annealing algorithm, which is a combination of simulated annealing and the Stochastic Approximation Monte Carlo algorithm. Under the framework of Stochastic Approximation, it is shown that the new algorithm can work with a cooling schedule in which the temperature can decrease much faster than in the logarithmic cooling schedule, for example, a square-root cooling schedule, while guaranteeing the global optima to be reached when the temperature tends to zero. The new algorithm has been tested on a few benchmark optimization problems, including feed-forward neural...

15 days free trial to Access Article
a resampling based Stochastic Approximation method for analysis of large geostatistical data

Journal of the American Statistical Association, 2013

Co-Authors: Faming Liang, Yichen Cheng, Qifan Song, Jincheol Park, Ping Yang

Abstract:

The Gaussian geostatistical model has been widely used in modeling of spatial data. However, it is challenging to computationally implement this method because it requires the inversion of a large covariance matrix, particularly when there is a large number of observations. This article proposes a resampling-based Stochastic Approximation method to address this challenge. At each iteration of the proposed method, a small subsample is drawn from the full dataset, and then the current estimate of the parameters is updated accordingly under the framework of Stochastic Approximation. Since the proposed method makes use of only a small proportion of the data at each iteration, it avoids inverting large covariance matrices and thus is scalable to large datasets. The proposed method also leads to a general parameter estimation approach, maximum mean log-likelihood estimation, which includes the popular maximum (log)-likelihood estimation (MLE) approach as a special case and is expected to play an important role ...

15 days free trial to Access Article
On the use of Stochastic Approximation Monte Carlo for Monte Carlo integration

Statistics & Probability Letters, 2009

Co-Authors: Faming Liang

Abstract:

The Stochastic Approximation Monte Carlo (SAMC) algorithm has recently been proposed as a dynamic optimization algorithm in the literature. In this paper, we show in theory that the samples generated by SAMC can be used for Monte Carlo integration via a dynamically weighted estimator by calling some results from the literature of nonhomogeneous Markov chains. Our numerical results indicate that SAMC can yield significant savings over conventional Monte Carlo algorithms, such as the Metropolis-Hastings algorithm, for the problems for which the energy landscape is rugged.

15 days free trial to Access Article

Uday V Shanbhag - One of the best experts on this subject based on the ideXlab platform.

a regularized smoothing Stochastic Approximation rssa algorithm for Stochastic variational inequality problems

Winter Simulation Conference, 2013

Co-Authors: Farzad Yousefian, A Nedic, Uday V Shanbhag

Abstract:

We consider a Stochastic variational inequality (SVI) problem with a continuous and monotone mapping over a compact and convex set. Traditionally, Stochastic Approximation (SA) schemes for SVIs have relied on strong monotonicity and Lipschitzian properties of the underlying map. We present a regularized smoothed SA (RSSA) scheme where in the stepsize, smoothing, and regularization parameters are diminishing sequences. Under suitable assumptions on the sequences, we show that the algorithm generates iterates that converge to a solution in an almost-sure sense. Additionally, we provide rate estimates that relate iterates to their counterparts derived from the Tikhonov trajectory associated with a deterministic problem.

15 days free trial to Access Article
regularized iterative Stochastic Approximation methods for Stochastic variational inequality problems

IEEE Transactions on Automatic Control, 2013

Co-Authors: Jayash Koshal, A Nedic, Uday V Shanbhag

Abstract:

We consider a Cartesian Stochastic variational inequality problem with a monotone map. Monotone Stochastic variational inequalities arise naturally, for instance, as the equilibrium conditions of monotone Stochastic Nash games over continuous strategy sets or multiuser Stochastic optimization problems. We introduce two classes of Stochastic Approximation methods, each of which requires exactly one projection step at every iteration, and provide convergence analysis for each of them. Of these, the first is a Stochastic iterative Tikhonov regularization method which necessitates the update of the regularization parameter after every iteration. The second method is a Stochastic iterative proximal-point method, where the centering term is updated after every iteration. The Cartesian structure lends itself to constructing distributed multi-agent extensions and conditions are provided for recovering global convergence in limited coordination variants where agents are allowed to choose their steplength sequences, regularization and centering parameters independently, while meeting a suitable coordination requirement. We apply the proposed class of techniques and their limited coordination versions to a Stochastic networked rate allocation problem.

15 days free trial to Access Article

Krishanu Maulik - One of the best experts on this subject based on the ideXlab platform.

Stochastic Approximation with random step sizes and urn models with random replacement matrices having finite mean

Annals of Applied Probability, 2019

Co-Authors: Ujan Gangopadhyay, Krishanu Maulik

Abstract:

The Stochastic Approximation algorithm is a useful technique which has been exploited successfully in probability theory and statistics for a long time. The step sizes used in Stochastic Approximation are generally taken to be deterministic and same is true for the drift. However, the specific application of urn models with random replacement matrices motivates us to consider Stochastic Approximation in a setup where both the step sizes and the drift are random, but the sequence is uniformly bounded. The problem becomes interesting when the negligibility conditions on the errors hold only in probability. We first prove a result on Stochastic Approximation in this setup, which is new in the literature. Then, as an application, we study urn models with random replacement matrices. In the urn model, the replacement matrices need neither be independent, nor identically distributed. We assume that the replacement matrices are only independent of the color drawn in the same round conditioned on the entire past. We relax the usual second moment assumption on the replacement matrices in the literature and require only first moment to be finite. We require the conditional expectation of the replacement matrix given the past to be close to an irreducible matrix, in an appropriate sense. We do not require any of the matrices to be balanced or nonrandom. We prove convergence of the proportion vector, the composition vector and the count vector in $L^{1}$, and hence in probability. It is to be noted that the related differential equation is of Lotka–Volterra type and can be analyzed directly.

15 days free trial to Access Article
Stochastic Approximation with random step sizes and urn models with random replacement matrices having finite mean

arXiv: Probability, 2017

Co-Authors: Ujan Gangopadhyay, Krishanu Maulik

Abstract:

Stochastic Approximation algorithm is a useful technique which has been exploited successfully in probability theory and statistics for a long time. The step sizes used in Stochastic Approximation are generally taken to be deterministic and same is true for the drift. However, the specific application of urn models with random replacement matrices motivates us to consider Stochastic Approximation in a setup where both the step sizes and the drift are random, but the sequence is uniformly bounded. The problem becomes interesting when the negligibility conditions on the errors hold only in probability. We first prove a result on Stochastic Approximation in this setup, which is new in the literature. Then, as an application, we study urn models with random replacement matrices. In the urn model, the replacement matrices need neither be independent, nor identically distributed. We assume that the replacement matrices are only independent of the color drawn in the same round conditioned on the entire past. We relax the usual second moment assumption on the replacement matrices in the literature and require only first moment to be finite. We require the conditional expectation of the replacement matrix given the past to be close to an irreducible matrix, in an appropriate sense. We do not require any of the matrices to be balanced or nonrandom. We prove convergence of the proportion vector, the composition vector and the count vector in $L^1$, and hence in probability. It is to be noted that the related differential equation is of Lotka-Volterra type and can be analyzed directly.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

James C. Spall - One of the best experts on this subject based on the ideXlab platform.

efficient implementation of second order Stochastic Approximation algorithms in high dimensional problems

Discrete simultaneous perturbation Stochastic Approximation on loss function with noisy measurements

Robust Neural Network Tracking Controller Using Simultaneous Perturbation Stochastic Approximation

adaptive Stochastic Approximation by the simultaneous perturbation method

adaptive Stochastic Approximation by the simultaneous perturbation method

Francis Bach - One of the best experts on this subject based on the ideXlab platform.

Nonparametric Stochastic Approximation with large step-sizes

Non-parametric Stochastic Approximation with Large Step sizes

non asymptotic analysis of Stochastic Approximation algorithms for machine learning

Faming Liang - One of the best experts on this subject based on the ideXlab platform.

simulated Stochastic Approximation annealing for global optimization with a square root cooling schedule

simulated Stochastic Approximation annealing for global optimization with a square root cooling schedule

a resampling based Stochastic Approximation method for analysis of large geostatistical data

On the use of Stochastic Approximation Monte Carlo for Monte Carlo integration

Uday V Shanbhag - One of the best experts on this subject based on the ideXlab platform.

a regularized smoothing Stochastic Approximation rssa algorithm for Stochastic variational inequality problems

regularized iterative Stochastic Approximation methods for Stochastic variational inequality problems

Krishanu Maulik - One of the best experts on this subject based on the ideXlab platform.

Stochastic Approximation with random step sizes and urn models with random replacement matrices having finite mean

Stochastic Approximation with random step sizes and urn models with random replacement matrices having finite mean

Stochastic Approximation

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Related terms

James C. Spall - One of the best experts on this subject based on the ideXlab platform.

Francis Bach - One of the best experts on this subject based on the ideXlab platform.

Faming Liang - One of the best experts on this subject based on the ideXlab platform.

Uday V Shanbhag - One of the best experts on this subject based on the ideXlab platform.

Krishanu Maulik - One of the best experts on this subject based on the ideXlab platform.

Related terms