Simplifying momentum-based positive-definite submanifold optimization with applications to...

R Eschenhagen, A Immer, R Turner… - Advances in …, 2024 - proceedings.neurips.cc

The core components of many modern neural network architectures, such as transformers,
convolutional, or graph neural networks, can be expressed as linear layers with* weight …

被引用次数：16 相关文章所有 6 个版本

[PDF] arxiv.org

Can we remove the square-root in adaptive gradient methods? a second-order perspective

W Lin, F Dangel, R Eschenhagen, J Bae… - arXiv preprint arXiv …, 2024 - arxiv.org

Adaptive gradient optimizers like Adam (W) are the default training algorithms for many deep
learning architectures, such as transformers. Their diagonal preconditioner is based on the …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Variational Stochastic Gradient Descent for Deep Neural Networks

H Chen, A Kuzina, B Esmaeili, JM Tomczak - arXiv preprint arXiv …, 2024 - arxiv.org

Optimizing deep neural networks is one of the main tasks in successful deep learning.
Current state-of-the-art optimizers are adaptive gradient-based optimization methods such …

A Geometric Modeling of Occam's Razor in Deep Learning

K Sun, F Nielsen - arXiv preprint arXiv:1905.11027, 2019 - arxiv.org

Why do deep neural networks (DNNs) benefit from very high dimensional parameter
spaces? Their huge parameter complexities vs. stunning performances in practice is all the …

被引用次数：7 相关文章所有 3 个版本

[PDF] escholarship.org

[图书][B] Symplectic Numerical Integration at the Service of Accelerated Optimization and Structure-Preserving Dynamics Learning

V Duruisseaux - 2023 - search.proquest.com

Symplectic numerical integrators for Hamiltonian systems form the paramount class of
geometric numerical integrators, and have been very well investigated in the past forty …

StEVE: Adaptive Optimization in a Kronecker-Factored Eigenbasis

JNM Gamboa - openreview.net

Adaptive optimization algorithms such as Adam see widespread use in Deep Learning.
However, these methods rely on diagonal approximations of the preconditioner, losing much …