Cocktailsgd: Fine-tuning foundation models over 500mbps networks

J Wang, Y Lu, B Yuan, B Chen… - International …, 2023 - proceedings.mlr.press
Distributed training of foundation models, especially large language models (LLMs), is
communication-intensive and so has heavily relied on centralized data centers with fast …

Fine-tuning language models over slow networks using activation quantization with guarantees

J Wang, B Yuan, L Rimanic, Y He… - Advances in …, 2022 - proceedings.neurips.cc
Communication compression is a crucial technique for modern distributed learning systems
to alleviate their communication bottlenecks over slower networks. Despite recent intensive …

Fine-tuning language models over slow networks using activation compression with guarantees

J Wang, B Yuan, L Rimanic, Y He, T Dao… - arXiv preprint arXiv …, 2022 - arxiv.org
Communication compression is a crucial technique for modern distributed learning systems
to alleviate their communication bottlenecks over slower networks. Despite recent intensive …

Exploring the robustness of decentralized training for large language models

L Lu, C Dai, W Tao, B Yuan, Y Sun, P Zhou - arXiv preprint arXiv …, 2023 - arxiv.org
Decentralized training of large language models has emerged as an effective way to
democratize this technology. However, the potential threats associated with this approach …

How Can We Train Deep Learning Models Across Clouds and Continents? An Experimental Study

A Erben, R Mayer, HA Jacobsen - arXiv preprint arXiv:2306.03163, 2023 - arxiv.org
This paper aims to answer the question: Can deep learning models be cost-efficiently
trained on a global market of spot VMs spanning different data centers and cloud providers …

Semantic parameter matching in Web APIs with Transformer-based question answering

S Kotstein, C Decker - 2023 IEEE International Conference on …, 2023 - ieeexplore.ieee.org
OpenAPI, WADL, RAML, and API Blueprint are popular formats for documenting Web APIs.
Although these formats are in general both human and machine-readable, only the part of …

ATOM: Asynchronous Training of Massive Models for Deep Learning in a Decentralized Environment

X Wu, J Rao, W Chen - arXiv preprint arXiv:2403.10504, 2024 - arxiv.org
The advent of the Transformer architecture has propelled the growth of natural language
processing (NLP) models, leading to remarkable achievements in numerous NLP tasks. Yet …

Position: Exploring the Robustness of Pipeline-Parallelism-Based Decentralized Training

L Lu, C Dai, W Tao, B Yuan, Y Sun, P Zhou - Forty-first International … - openreview.net
Modern machine learning applications increasingly demand greater computational
resources for training large models. Decentralized training has emerged as an effective …

CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks

W Jue, Y Lu, B Yuan, B Chen, P Liang, C De Sa, C Re… - openreview.net
Distributed training of foundation models, especially large language models (LLMs), is
communication-intensive and so has heavily relied on centralized data centers with fast …