Generalizing adam to manifolds for efficiently training transformers
B Brantner - arXiv preprint arXiv:2305.16901, 2023 - arxiv.org
One of the primary reasons behind the success of neural networks has been the emergence
of an array of new, highly-successful optimizers, perhaps most importantly the Adam …
of an array of new, highly-successful optimizers, perhaps most importantly the Adam …