Rtra: Rapid training of regularization-based approaches in continual learning

S Nokhwal, N Kumar - 2023 10th International Conference on …, 2023 - ieeexplore.ieee.org
Catastrophic forgetting (CF) is a significant challenge in continual learning (CL). In
regularization-based approaches to mitigate CF, modifications to important training …

Doubly adaptive scaled algorithm for machine learning using second-order information

M Jahani, S Rusakov, Z Shi, P Richtárik… - arXiv preprint arXiv …, 2021 - arxiv.org
We present a novel adaptive optimization algorithm for large-scale machine learning
problems. Equipped with a low-cost estimate of local curvature and Lipschitz smoothness …

New analysis of linear convergence of gradient-type methods via unifying error bound conditions

H Zhang - Mathematical Programming, 2020 - Springer
This paper reveals that a common and central role, played in many error bound (EB)
conditions and a variety of gradient-type methods, is a residual measure operator. On one …

The condition number of a function relative to a set

DH Gutman, JF Pena - Mathematical Programming, 2021 - Springer
The condition number of a differentiable convex function, namely the ratio of its smoothness
to strong convexity constants, is closely tied to fundamental properties of the function. In …

Efficient distributed hessian free algorithm for large-scale empirical risk minimization via accumulating sample strategy

M Jahani, X He, C Ma, A Mokhtari… - International …, 2020 - proceedings.mlr.press
In this paper, we propose a Distributed Accumulated Newton Conjugate gradiEnt (DANCE)
method in which sample size is gradually increasing to quickly obtain a solution whose …

A minibatch proximal stochastic recursive gradient algorithm using a trust-region-like scheme and Barzilai–Borwein stepsizes

T Yu, XW Liu, YH Dai, J Sun - IEEE Transactions on Neural …, 2020 - ieeexplore.ieee.org
We consider the problem of minimizing the sum of an average of a large number of smooth
convex component functions and a possibly nonsmooth convex function that admits a simple …

SONIA: a symmetric blockwise truncated optimization algorithm

M Jahani, M Nazari, R Tappenden… - International …, 2021 - proceedings.mlr.press
This work presents a new optimization algorithm for empirical risk minimization. The
algorithm bridges the gap between first-and second-order methods by computing a search …

The condition of a function relative to a polytope

DH Gutman, JF Pena - arXiv preprint arXiv:1802.00271, 2018 - arxiv.org
The condition number of a smooth convex function, namely the ratio of its smoothness to
strong convexity constants, is closely tied to fundamental properties of the function. In …

Gradient Descent and the Power Method: Exploiting their connection to find the leftmost eigen-pair and escape saddle points

R Tappenden, M Takáč - arXiv preprint arXiv:2211.00866, 2022 - arxiv.org
This work shows that applying Gradient Descent (GD) with a fixed step size to minimize a
(possibly nonconvex) quadratic function is equivalent to running the Power Method (PM) on …

Efficient and Scalable Optimization Methods for Training Large-Scale Machine Learning Models

M Jahani - 2021 - search.proquest.com
Many important problems in machine learning (ML) and data science are formulated as
optimization problems and solved using optimization algorithms. With the scale of modern …