Activated gradients for deep neural networks
Deep neural networks often suffer from poor performance or even training failure due to the
ill-conditioned problem, the vanishing/exploding gradient problem, and the saddle point …
ill-conditioned problem, the vanishing/exploding gradient problem, and the saddle point …
Long short-term memory with activation on gradient
As the number of long short-term memory (LSTM) layers increases, vanishing/exploding
gradient problems exacerbate and have a negative impact on the performance of the LSTM …
gradient problems exacerbate and have a negative impact on the performance of the LSTM …
Optimal first-order methods for convex functions with a quadratic upper bound
We analyze worst-case convergence guarantees of first-order optimization methods over a
function class extending that of smooth and convex functions. This class contains convex …
function class extending that of smooth and convex functions. This class contains convex …
Gradient descent is optimal under lower restricted secant inequality and upper error bound
C Guille-Escuret, A Ibrahim… - Advances in Neural …, 2022 - proceedings.neurips.cc
The study of first-order optimization is sensitive to the assumptions made on the objective
functions. These assumptions induce complexity classes which play a key role in worst-case …
functions. These assumptions induce complexity classes which play a key role in worst-case …
An exponentially converging particle method for the mixed nash equilibrium of continuous games
We consider the problem of computing mixed Nash equilibria of two-player zero-sum games
with continuous sets of pure strategies and with first-order access to the payoff function. This …
with continuous sets of pure strategies and with first-order access to the payoff function. This …
Communication-efficient federated learning: A second order newton-type method with analog over-the-air aggregation
Owing to their fast convergence, second-order Newton-type learning methods have recently
received attention in the federated learning (FL) setting. However, current solutions are …
received attention in the federated learning (FL) setting. However, current solutions are …
On the Convergence of AdaGrad (Norm) on R^ d: Beyond Convexity, Non-Asymptotic Rate and Acceleration
Existing analysis of AdaGrad and other adaptive methods for smooth convex optimization is
typically for functions with bounded domain diameter. In unconstrained problems, previous …
typically for functions with bounded domain diameter. In unconstrained problems, previous …
DIN: A decentralized inexact Newton algorithm for consensus optimization
This paper tackles a challenging decentralized consensus optimization problem defined
over a network of interconnected devices. The devices work collaboratively to solve a …
over a network of interconnected devices. The devices work collaboratively to solve a …
Constrained minimum variance and covariance steering based on affine disturbance feedback control parameterization
This paper deals with finite‐horizon minimum‐variance and covariance steering problems
subject to constraints. The goal of the minimum variance problem is to steer the state mean …
subject to constraints. The goal of the minimum variance problem is to steer the state mean …
Mean-Field Langevin Dynamics for Signed Measures via a Bilevel Approach
Mean-field Langevin dynamics (MLFD) is a class of interacting particle methods that tackle
convex optimization over probability measures on a manifold, which are scalable, versatile …
convex optimization over probability measures on a manifold, which are scalable, versatile …