Rtra: Rapid training of regularization-based approaches in continual learning
S Nokhwal, N Kumar - 2023 10th International Conference on …, 2023 - ieeexplore.ieee.org
Catastrophic forgetting (CF) is a significant challenge in continual learning (CL). In
regularization-based approaches to mitigate CF, modifications to important training …
regularization-based approaches to mitigate CF, modifications to important training …
Doubly adaptive scaled algorithm for machine learning using second-order information
We present a novel adaptive optimization algorithm for large-scale machine learning
problems. Equipped with a low-cost estimate of local curvature and Lipschitz smoothness …
problems. Equipped with a low-cost estimate of local curvature and Lipschitz smoothness …
New analysis of linear convergence of gradient-type methods via unifying error bound conditions
H Zhang - Mathematical Programming, 2020 - Springer
This paper reveals that a common and central role, played in many error bound (EB)
conditions and a variety of gradient-type methods, is a residual measure operator. On one …
conditions and a variety of gradient-type methods, is a residual measure operator. On one …
The condition number of a function relative to a set
The condition number of a differentiable convex function, namely the ratio of its smoothness
to strong convexity constants, is closely tied to fundamental properties of the function. In …
to strong convexity constants, is closely tied to fundamental properties of the function. In …
Efficient distributed hessian free algorithm for large-scale empirical risk minimization via accumulating sample strategy
In this paper, we propose a Distributed Accumulated Newton Conjugate gradiEnt (DANCE)
method in which sample size is gradually increasing to quickly obtain a solution whose …
method in which sample size is gradually increasing to quickly obtain a solution whose …
A minibatch proximal stochastic recursive gradient algorithm using a trust-region-like scheme and Barzilai–Borwein stepsizes
We consider the problem of minimizing the sum of an average of a large number of smooth
convex component functions and a possibly nonsmooth convex function that admits a simple …
convex component functions and a possibly nonsmooth convex function that admits a simple …
SONIA: a symmetric blockwise truncated optimization algorithm
This work presents a new optimization algorithm for empirical risk minimization. The
algorithm bridges the gap between first-and second-order methods by computing a search …
algorithm bridges the gap between first-and second-order methods by computing a search …
The condition of a function relative to a polytope
DH Gutman, JF Pena - arXiv preprint arXiv:1802.00271, 2018 - arxiv.org
The condition number of a smooth convex function, namely the ratio of its smoothness to
strong convexity constants, is closely tied to fundamental properties of the function. In …
strong convexity constants, is closely tied to fundamental properties of the function. In …
Gradient Descent and the Power Method: Exploiting their connection to find the leftmost eigen-pair and escape saddle points
R Tappenden, M Takáč - arXiv preprint arXiv:2211.00866, 2022 - arxiv.org
This work shows that applying Gradient Descent (GD) with a fixed step size to minimize a
(possibly nonconvex) quadratic function is equivalent to running the Power Method (PM) on …
(possibly nonconvex) quadratic function is equivalent to running the Power Method (PM) on …
Efficient and Scalable Optimization Methods for Training Large-Scale Machine Learning Models
M Jahani - 2021 - search.proquest.com
Many important problems in machine learning (ML) and data science are formulated as
optimization problems and solved using optimization algorithms. With the scale of modern …
optimization problems and solved using optimization algorithms. With the scale of modern …