Sufficiently Small Step

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 258 Experts worldwide ranked by ideXlab platform

Babak Hassibi - One of the best experts on this subject based on the ideXlab platform.

  • Robustifying Binary Classification to Adversarial Perturbation.
    arXiv: Learning, 2020
    Co-Authors: Fariborz Salehi, Babak Hassibi
    Abstract:

    Despite the enormous success of machine learning models in various applications, most of these models lack resilience to (even Small) perturbations in their input data. Hence, new methods to robustify machine learning models seem very essential. To this end, in this paper we consider the problem of binary classification with adversarial perturbations. Investigating the solution to a min-max optimization (which considers the worst-case loss in the presence of adversarial perturbations) we introduce a generalization to the max-margin classifier which takes into account the power of the adversary in manipulating the data. We refer to this classifier as the "Robust Max-margin" (RM) classifier. Under some mild assumptions on the loss function, we theoretically show that the gradient descent iterates (with Sufficiently Small Step size) converge to the RM classifier in its direction. Therefore, the RM classifier can be studied to compute various performance measures (e.g. generalization error) of binary classification with adversarial perturbations.

  • A Study of Generalization of Stochastic Mirror Descent Algorithms on Overparameterized Nonlinear Models
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2020
    Co-Authors: Navid Azizan, Sahin Lale, Babak Hassibi
    Abstract:

    We study the convergence, the implicit regularization and the generalization of stochastic mirror descent (SMD) algorithms in overparameterized nonlinear models, where the number of model parameters exceeds the number of training data points. Due to overpa-rameterization, the training loss has infinitely many global minima where they define a manifold of interpolating solutions. To have an understanding of the generalization performance of SMD algorithms, it is important to characterize which global minima the SMD algorithms converge to. In this work, we first theoretically show that in the overparameterized nonlinear setting, if the initialization is close enough to the manifold of global minima, which is usually the case in the high overparameterization setting, using Sufficiently Small Step size, SMD converges to a global minimum. We further prove that this global minimum is approximately the closest one to the initialization in Bregman divergence, demonstrating the approximate implicit regularization of SMD. We then empirically confirm that these theoretical results are observed in practice. Finally, we provide an extensive study of the generalization of SMD algorithms. In our experiments, we show that on the CIFAR-10 dataset, SMD with ℓ10 norm potential (as a surrogate for ℓ∞) consistently general-izes better than SGD (corresponding to an ℓ2 norm potential), which in turn consistently outperforms SMD with ℓ1 norm potential.

  • ICASSP - A Study of Generalization of Stochastic Mirror Descent Algorithms on Overparameterized Nonlinear Models
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2020
    Co-Authors: Navid Azizan Ruhi, Sahin Lale, Babak Hassibi
    Abstract:

    We study the convergence, the implicit regularization and the generalization of stochastic mirror descent (SMD) algorithms in overparameterized nonlinear models, where the number of model parameters exceeds the number of training data points. Due to overpa-rameterization, the training loss has infinitely many global minima where they define a manifold of interpolating solutions. To have an understanding of the generalization performance of SMD algorithms, it is important to characterize which global minima the SMD algorithms converge to. In this work, we first theoretically show that in the overparameterized nonlinear setting, if the initialization is close enough to the manifold of global minima, which is usually the case in the high overparameterization setting, using Sufficiently Small Step size, SMD converges to a global minimum. We further prove that this global minimum is approximately the closest one to the initialization in Bregman divergence, demonstrating the approximate implicit regularization of SMD. We then empirically confirm that these theoretical results are observed in practice. Finally, we provide an extensive study of the generalization of SMD algorithms. In our experiments, we show that on the CIFAR-10 dataset, SMD with l 10 norm potential (as a surrogate for l ∞ ) consistently general-izes better than SGD (corresponding to an l 2 norm potential), which in turn consistently outperforms SMD with l 1 norm potential.

  • Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization
    arXiv: Learning, 2019
    Co-Authors: Navid Azizan Ruhi, Sahin Lale, Babak Hassibi
    Abstract:

    Most modern learning problems are highly overparameterized, meaning that there are many more parameters than the number of training data points, and as a result, the training loss may have infinitely many global minima (parameter vectors that perfectly interpolate the training data). Therefore, it is important to understand which interpolating solutions we converge to, how they depend on the initialization point and the learning algorithm, and whether they lead to different generalization performances. In this paper, we study these questions for the family of stochastic mirror descent (SMD) algorithms, of which the popular stochastic gradient descent (SGD) is a special case. Our contributions are both theoretical and experimental. On the theory side, we show that in the overparameterized nonlinear setting, if the initialization is close enough to the manifold of global minima (something that comes for free in the highly overparameterized case), SMD with Sufficiently Small Step size converges to a global minimum that is approximately the closest one in Bregman divergence. On the experimental side, our extensive experiments on standard datasets and models, using various initializations, various mirror descents, and various Bregman divergences, consistently confirms that this phenomenon happens in deep learning. Our experiments further indicate that there is a clear difference in the generalization performance of the solutions obtained by different SMD algorithms. Experimenting on a standard image dataset and network architecture with SMD with different kinds of implicit regularization, $\ell_1$ to encourage sparsity, $\ell_2$ yielding SGD, and $\ell_{10}$ to discourage large components in the parameter vector, consistently and definitively shows that $\ell_{10}$-SMD has better generalization performance than SGD, which in turn has better generalization performance than $\ell_1$-SMD.

  • stochastic gradient mirror descent minimax optimality and implicit regularization
    arXiv: Learning, 2018
    Co-Authors: Navid Azizan Ruhi, Babak Hassibi
    Abstract:

    Stochastic descent methods (of the gradient and mirror varieties) have become increasingly popular in optimization. In fact, it is now widely recognized that the success of deep learning is not only due to the special deep architecture of the models, but also due to the behavior of the stochastic descent methods used, which play a key role in reaching "good" solutions that generalize well to unseen data. In an attempt to shed some light on why this is the case, we revisit some minimax properties of stochastic gradient descent (SGD) for the square loss of linear models---originally developed in the 1990's---and extend them to general stochastic mirror descent (SMD) algorithms for general loss functions and nonlinear models. In particular, we show that there is a fundamental identity which holds for SMD (and SGD) under very general conditions, and which implies the minimax optimality of SMD (and SGD) for Sufficiently Small Step size, and for a general class of loss functions and general nonlinear models. We further show that this identity can be used to naturally establish other properties of SMD (and SGD), namely convergence and implicit regularization for over-parameterized linear models (in what is now being called the "interpolating regime"), some of which have been shown in certain cases in prior literature. We also argue how this identity can be used in the so-called "highly over-parameterized" nonlinear setting (where the number of parameters far exceeds the number of data points) to provide insights into why SMD (and SGD) may have similar convergence and implicit regularization properties for deep learning.

Mingzhu Liu - One of the best experts on this subject based on the ideXlab platform.

  • Convergence and stability of the one-leg θ method for stochastic differential equations with piecewise continuous arguments
    Filomat, 2019
    Co-Authors: Minghui Song, Mingzhu Liu
    Abstract:

    The equivalent relation is established here about the stability of stochastic differential equations with piecewise continuous arguments(SDEPCAs) and that of the one-leg ? method applied to the SDEPCAs. Firstly, the convergence of the one-leg ? method to SDEPCAs under the global Lipschitz condition is proved. Secondly, it is proved that the SDEPCAs are pth(p 2 (0; 1)) moment exponentially stable if and only if the one-leg ? method is pth moment exponentially stable for some Sufficiently Small Step-size. Thirdly, the corollaries that the pth moment exponential stability of the SDEPCAs (the one-leg ? method) implies the almost sure exponential stability of the SDEPCAs (the one-leg ? method) are given. Finally, numerical simulations are provided to illustrate the theoretical results.

  • Almost sure exponential stability of stochastic differential delay equations
    Filomat, 2019
    Co-Authors: Wei Zhang, M. H. Song, Mingzhu Liu
    Abstract:

    This paper mainly studies whether the almost sure exponential stability of stochastic differential delay equations (SDDEs) is shared with that of the stochastic theta method. We show that under the global Lipschitz condition the SDDE is pth moment exponentially stable (for p 2 (0; 1)) if and only if the stochastic theta method of the SDDE is pth moment exponentially stable and pth moment exponential stability of the SDDE or the stochastic theta method implies the almost sure exponential stability of the SDDE or the stochastic theta method, respectively. We then replace the global Lipschitz condition with a finite-time convergence condition and establish the same results. Hence, our new theory enables us to consider the almost sure exponential stability of the SDDEs using the stochastic theta method, instead of the method of Lyapunov functions. That is, we can now perform careful numerical simulations using the stochastic theta method with a Sufficiently Small Step size ?t. If the stochastic theta method is pth moment exponentially stable for a Sufficiently Small p ? (0,1), we can then deduce that the underlying SDDE is almost sure exponentially stable. Our new theory also enables us to show the pth moment exponential stability of the stochastic theta method to reproduce the almost sure exponential stability of the SDDEs.

  • Stability and Neimark–Sacker bifurcation in Runge–Kutta methods for a predator–prey system
    International Journal of Computer Mathematics, 2009
    Co-Authors: Qiubao Wang, Mingzhu Liu
    Abstract:

    We investigate the discretization of a predator–prey system with two delays under the general Runge–Kutta methods. It is shown that if the exact solution undergoes a Hopf bifurcation at τ=τ*, then the numerical solution undergoes a Neimark–Sacker bifurcation at τ(h)=τ*+O(h p ) for Sufficiently Small Step size h, where p≥1 is the order of the Runge–Kutta method applied. The direction of Neimark–Sacker bifurcation and stability of bifurcating invariant curve are the same as that of delay differential equation.

  • Numerical Hopf bifurcation of Runge–Kutta methods for a class of delay differential equations
    Chaos Solitons & Fractals, 2009
    Co-Authors: Qiubao Wang, Mingzhu Liu
    Abstract:

    Abstract In this paper, we consider the discretization of parameter-dependent delay differential equation of the form y ′ ( t ) = f ( y ( t ) , y ( t - 1 ) , τ ) , τ ⩾ 0 , y ∈ R d . It is shown that if the delay differential equation undergoes a Hopf bifurcation at τ = τ ∗ , then the discrete scheme undergoes a Hopf bifurcation at τ ( h ) = τ ∗ + O ( h p ) for Sufficiently Small Step size h, where p ⩾ 1 is the order of the Runge–Kutta method applied. The direction of numerical Hopf bifurcation and stability of bifurcating invariant curve are the same as that of delay differential equation.

  • Numerical Hopf bifurcation of linear multiStep methods for a class of delay differential equations
    Applied Mathematics and Computation, 2009
    Co-Authors: Mingzhu Liu, Qiubao Wang
    Abstract:

    Abstract In this paper, we consider the discretization of parameter-dependent delay differential equation of the form y ′ ( t ) = f ( y ( t ) , y ( t - 1 ) , τ ) , τ ⩾ 0 , y ∈ R d . It is shown that if the delay differential equation undergoes a Hopf bifurcation at τ = τ ∗ , then the discrete scheme undergoes a Hopf bifurcation at τ ( h ) = τ ∗ + O ( h p ) for Sufficiently Small Step size h, where p ⩾ 1 is the order of the strictly stable linear multiStep method. The direction of numerical Hopf bifurcation and stability of bifurcating invariant curve are the same as that of the corresponding delay differential equation.

Ali H. Sayed - One of the best experts on this subject based on the ideXlab platform.

  • ICASSP - Distributed Coupled Learning Over Adaptive Networks
    2018 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2018
    Co-Authors: Sulaiman A. Alghunaim, Ali H. Sayed
    Abstract:

    This work develops an effective distributed algorithm for the solution of stochastic optimization problems that involve partial coupling among both local constraints and local cost functions. While the collection of networked agents is interested in discovering a global model, the individual agents are sensing data that is only dependent on parts of the model. Moreover, different agents may be dependent on different subsets of the model. In this way, cooperation is justified and also necessary to enable recovery of the global information. In view of the local constraints, we show how to relax the optimization problem to a penalized form, and how to enable cooperation among neighboring agents. We establish mean-square-error convergence of the resulting strategy for Sufficiently Small Step-sizes and large penalty factors. We also illustrate performance by means of simulations.

  • ICASSP - Performance limits of single-agent and multi-agent sub-gradient stochastic learning
    2016 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2016
    Co-Authors: Bicheng Ying, Ali H. Sayed
    Abstract:

    This work examines the performance of stochastic sub-gradient learning strategies, for both cases of stand-alone and networked agents, under weaker conditions than usually considered in the literature. It is shown that these conditions are automatically satisfied by several important cases of interest, including support-vector machines and sparsity-inducing learning solutions. The analysis establishes that sub-gradient strategies can attain exponential convergence rates, as opposed to sub-linear rates, and that they can approach the optimal solution within O(p), for Sufficiently Small Step-sizes, p. A realizable exponential-weighting procedure is proposed to smooth the intermediate iterates and to guarantee these desirable performance properties.

  • Multitask Diffusion Adaptation Over Asynchronous Networks
    IEEE Transactions on Signal Processing, 2016
    Co-Authors: Roula Nassif, Cedric Richard, André Ferrari, Ali H. Sayed
    Abstract:

    The multitask diffusion LMS is an efficient strategy to simultaneously infer, in a collaborative manner, multiple parameter vectors. Existing works on multitask problems assume that all agents respond to data synchronously. In several applications, agents may not be able to act synchronously because networks can be subject to several sources of uncertainties such as changing topology, random link failures, or agents turning on and off for energy conservation. In this paper, we describe a model for the solution of multitask problems over asynchronous networks and carry out a detailed mean and mean-square error analysis. Results show that Sufficiently Small Step-sizes can still ensure both stability and performance. Simulations and illustrative examples are provided to verify the theoretical findings.

  • ICASSP - Proximal diffusion for stochastic costs with non-differentiable regularizers
    2015 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2015
    Co-Authors: Stefan Vlaski, Ali H. Sayed
    Abstract:

    We consider networks of agents cooperating to minimize a global objective, modeled as the aggregate sum of regularized costs that are not required to be differentiable. Since the subgradients of the individual costs cannot generally be assumed to be uniformly bounded, general distributed subgradient techniques are not applicable to these problems. We isolate the requirement of bounded subgradients into the regularizer and use splitting techniques to develop a stochastic proximal diffusion strategy for solving the optimization problem by continuously learning from streaming data. We represent the implementation as the cascade of three operators and invoke Banach's fixed-point theorem to establish that, despite gradient noise, the stochastic implementation is able to converge in the mean-square-error sense within O(μ) from the optimal solution, for a Sufficiently Small Step-size parameter, μ.

  • Information-Sharing over Adaptive Networks with Self-interested Agents
    IEEE Transactions on Signal and Information Processing over Networks, 2015
    Co-Authors: Mihaela Van Der Schaar, Ali H. Sayed
    Abstract:

    We examine the behavior of multi-agent networks where information-sharing is subject to a positive communications cost over the edges linking the agents. We consider a general mean-square-error formulation where all agents are interested in estimating the same target vector. We first show that, in the absence of any incentives to cooperate, the optimal strategy for the agents is to behave in a selfish manner with each agent seeking the optimal solution independently of the other agents. Pareto inefficiency arises as a result of the fact that agents are not using historical data to predict the behavior of their neighbors and to know whether they will reciprocate and participate in sharing information. Motivated by this observation, we develop a reputation protocol to summarize the opponent's past actions into a reputation score, which can then be used to form a belief about the opponent's subsequent actions. The reputation protocol entices agents to cooperate and turns their optimal strategy into an action-choosing strategy that enhances the overall social benefit of the network. In particular, we show that when the communications cost becomes large, the expected social benefit of the proposed protocol outperforms the social benefit that is obtained by cooperative agents that always share data. We perform a detailed mean-square-error analysis of the evolution of the network over three domains: far field, near-field, and middle-field, and show that the network behavior is stable for Sufficiently Small Step-sizes. The various theoretical results are illustrated by numerical simulations.

Sahin Lale - One of the best experts on this subject based on the ideXlab platform.

  • A Study of Generalization of Stochastic Mirror Descent Algorithms on Overparameterized Nonlinear Models
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2020
    Co-Authors: Navid Azizan, Sahin Lale, Babak Hassibi
    Abstract:

    We study the convergence, the implicit regularization and the generalization of stochastic mirror descent (SMD) algorithms in overparameterized nonlinear models, where the number of model parameters exceeds the number of training data points. Due to overpa-rameterization, the training loss has infinitely many global minima where they define a manifold of interpolating solutions. To have an understanding of the generalization performance of SMD algorithms, it is important to characterize which global minima the SMD algorithms converge to. In this work, we first theoretically show that in the overparameterized nonlinear setting, if the initialization is close enough to the manifold of global minima, which is usually the case in the high overparameterization setting, using Sufficiently Small Step size, SMD converges to a global minimum. We further prove that this global minimum is approximately the closest one to the initialization in Bregman divergence, demonstrating the approximate implicit regularization of SMD. We then empirically confirm that these theoretical results are observed in practice. Finally, we provide an extensive study of the generalization of SMD algorithms. In our experiments, we show that on the CIFAR-10 dataset, SMD with ℓ10 norm potential (as a surrogate for ℓ∞) consistently general-izes better than SGD (corresponding to an ℓ2 norm potential), which in turn consistently outperforms SMD with ℓ1 norm potential.

  • ICASSP - A Study of Generalization of Stochastic Mirror Descent Algorithms on Overparameterized Nonlinear Models
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2020
    Co-Authors: Navid Azizan Ruhi, Sahin Lale, Babak Hassibi
    Abstract:

    We study the convergence, the implicit regularization and the generalization of stochastic mirror descent (SMD) algorithms in overparameterized nonlinear models, where the number of model parameters exceeds the number of training data points. Due to overpa-rameterization, the training loss has infinitely many global minima where they define a manifold of interpolating solutions. To have an understanding of the generalization performance of SMD algorithms, it is important to characterize which global minima the SMD algorithms converge to. In this work, we first theoretically show that in the overparameterized nonlinear setting, if the initialization is close enough to the manifold of global minima, which is usually the case in the high overparameterization setting, using Sufficiently Small Step size, SMD converges to a global minimum. We further prove that this global minimum is approximately the closest one to the initialization in Bregman divergence, demonstrating the approximate implicit regularization of SMD. We then empirically confirm that these theoretical results are observed in practice. Finally, we provide an extensive study of the generalization of SMD algorithms. In our experiments, we show that on the CIFAR-10 dataset, SMD with l 10 norm potential (as a surrogate for l ∞ ) consistently general-izes better than SGD (corresponding to an l 2 norm potential), which in turn consistently outperforms SMD with l 1 norm potential.

  • Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization
    arXiv: Learning, 2019
    Co-Authors: Navid Azizan Ruhi, Sahin Lale, Babak Hassibi
    Abstract:

    Most modern learning problems are highly overparameterized, meaning that there are many more parameters than the number of training data points, and as a result, the training loss may have infinitely many global minima (parameter vectors that perfectly interpolate the training data). Therefore, it is important to understand which interpolating solutions we converge to, how they depend on the initialization point and the learning algorithm, and whether they lead to different generalization performances. In this paper, we study these questions for the family of stochastic mirror descent (SMD) algorithms, of which the popular stochastic gradient descent (SGD) is a special case. Our contributions are both theoretical and experimental. On the theory side, we show that in the overparameterized nonlinear setting, if the initialization is close enough to the manifold of global minima (something that comes for free in the highly overparameterized case), SMD with Sufficiently Small Step size converges to a global minimum that is approximately the closest one in Bregman divergence. On the experimental side, our extensive experiments on standard datasets and models, using various initializations, various mirror descents, and various Bregman divergences, consistently confirms that this phenomenon happens in deep learning. Our experiments further indicate that there is a clear difference in the generalization performance of the solutions obtained by different SMD algorithms. Experimenting on a standard image dataset and network architecture with SMD with different kinds of implicit regularization, $\ell_1$ to encourage sparsity, $\ell_2$ yielding SGD, and $\ell_{10}$ to discourage large components in the parameter vector, consistently and definitively shows that $\ell_{10}$-SMD has better generalization performance than SGD, which in turn has better generalization performance than $\ell_1$-SMD.

Qiao Zhu - One of the best experts on this subject based on the ideXlab platform.

  • Mean-Square Exponential Input-to-State Stability of Numerical Solutions for Stochastic Control Systems
    Acta Automatica Sinica, 2013
    Co-Authors: Qiao Zhu, Jia-rui Cui
    Abstract:

    Abstract This paper deals with the mean-square exponential input-to-state stability (exp-ISS) of numerical solutions for stochastic control systems (SCSs). Firstly, it is shown that a finite-time strong convergence condition holds for the stochastic θ-method on SCSs. Then, we can see that the mean-square exp-ISS of an SCS holds if and only if that of the stochastic θ-method (for Sufficiently Small Step sizes) is preserved under the finite-time strong convergence condition. Secondly, for a class of SCSs with a one-sided Lipschitz drift, it is proved that two implicit Euler methods (for any Step sizes) can inherit the mean-square exp-ISS property of the SCSs. Finally, numerical examples confirm the correctness of the theorems presented in this study.

  • Exponential input-to-state stability of Runge-Kutta methods for neutral delay control systems
    IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, 2010
    Co-Authors: Yewei Xiao, Hongzhong Tang, Qiao Zhu
    Abstract:

    The aim of this paper is to find conditions under which the Runge-Kutta (RK) method reproduces the exponential input-to-state stability (exp-ISS) behavior of the nonlinear neutral delay control systems (NDCSs) without invoving control Lyapunov function. A spectial continuous RK method is introduced which is equivalent to the descrete RK method for the exp-ISS. Under global Lipschitz condition, boundedness and an appropriate strong convergence are gotten. Under this strong convergent condition, it is shown that, for Sufficiently Small Step-sizes, the exp-ISS of a NDCS holds if and only if that of the RK method is preserved.

  • Mean-square Exponential Input-to-state Stability of Euler-Maruyama Method Applied to Stochastic Control Systems
    Acta Automatica Sinica, 2010
    Co-Authors: Qiao Zhu, Li Zeng
    Abstract:

    Abstract This paper deals with the mean-square exponential input-to-state stability (exp-ISS) of Euler-Maruyama (EM) method applied to stochastic control systems (SCSs). The aim is to find out the conditions of the exact and EM method solutions to an SCS having the property of mean-square exp-ISS without involving control Lyapunov functions. Second moment boundedness and an appropriate form of strong convergence are achieved under global Lipschitz coefficients and mean-square continuous random inputs. Under the strong convergent condition, it is shown that the mean-square exp-ISS of an SCS holds if and only if that of the EM method is preserved for Sufficiently Small Step size.