Activated gradients for deep neural networks

M Liu, L Chen, X Du, L Jin… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Deep neural networks often suffer from poor performance or even training failure due to the
ill-conditioned problem, the vanishing/exploding gradient problem, and the saddle point …

Long short-term memory with activation on gradient

C Qin, L Chen, Z Cai, M Liu, L Jin - Neural Networks, 2023 - Elsevier
As the number of long short-term memory (LSTM) layers increases, vanishing/exploding
gradient problems exacerbate and have a negative impact on the performance of the LSTM …

Optimal first-order methods for convex functions with a quadratic upper bound

B Goujaud, A Taylor, A Dieuleveut - arXiv preprint arXiv:2205.15033, 2022 - arxiv.org
We analyze worst-case convergence guarantees of first-order optimization methods over a
function class extending that of smooth and convex functions. This class contains convex …

Gradient descent is optimal under lower restricted secant inequality and upper error bound

C Guille-Escuret, A Ibrahim… - Advances in Neural …, 2022 - proceedings.neurips.cc
The study of first-order optimization is sensitive to the assumptions made on the objective
functions. These assumptions induce complexity classes which play a key role in worst-case …

An exponentially converging particle method for the mixed nash equilibrium of continuous games

G Wang, L Chizat - arXiv preprint arXiv:2211.01280, 2022 - arxiv.org
We consider the problem of computing mixed Nash equilibria of two-player zero-sum games
with continuous sets of pure strategies and with first-order access to the payoff function. This …

Communication-efficient federated learning: A second order newton-type method with analog over-the-air aggregation

M Krouka, A Elgabli, CB Issaid… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Owing to their fast convergence, second-order Newton-type learning methods have recently
received attention in the federated learning (FL) setting. However, current solutions are …

On the Convergence of AdaGrad (Norm) on R^ d: Beyond Convexity, Non-Asymptotic Rate and Acceleration

Z Liu, TD Nguyen, A Ene, H Nguyen - International Conference on …, 2023 - par.nsf.gov
Existing analysis of AdaGrad and other adaptive methods for smooth convex optimization is
typically for functions with bounded domain diameter. In unconstrained problems, previous …

DIN: A decentralized inexact Newton algorithm for consensus optimization

A Ghalkha, CB Issaid, A Elgabli… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
This paper tackles a challenging decentralized consensus optimization problem defined
over a network of interconnected devices. The devices work collaboratively to solve a …

Constrained minimum variance and covariance steering based on affine disturbance feedback control parameterization

IM Balci, E Bakolas - … Journal of Robust and Nonlinear Control, 2024 - Wiley Online Library
This paper deals with finite‐horizon minimum‐variance and covariance steering problems
subject to constraints. The goal of the minimum variance problem is to steer the state mean …

Mean-Field Langevin Dynamics for Signed Measures via a Bilevel Approach

G Wang, A Moussavi-Hosseini, L Chizat - arXiv preprint arXiv:2406.17054, 2024 - arxiv.org
Mean-field Langevin dynamics (MLFD) is a class of interacting particle methods that tackle
convex optimization over probability measures on a manifold, which are scalable, versatile …