Communication Efficient Distributed Training with Distributed Lion

B Liu, L Wu, L Chen, K Liang, J Zhu, C Liang… - arXiv preprint arXiv …, 2024 - arxiv.org
The Lion optimizer has been a promising competitor with the AdamW for training large AI
models, with advantages on memory, computation, and sample efficiency. In this paper, we …