Heterogeneity-aware distributed machine learning training via partial reduce

Sancus: staleness-aware communication-avoiding full-graph decentralized training in large-scale graph neural networks

J Peng, Z Chen, Y Shao, Y Shen, L Chen… - Proceedings of the VLDB …, 2022 - dl.acm.org

Graph neural networks (GNNs) have emerged due to their success at modeling graph data.
Yet, it is challenging for GNNs to efficiently scale to large graphs. Thus, distributed GNNs …

被引用次数：58 相关文章所有 6 个版本

[PDF] arxiv.org

Blindfl: Vertical federated machine learning without peeking into your data

F Fu, H Xue, Y Cheng, Y Tao, B Cui - Proceedings of the 2022 …, 2022 - dl.acm.org

Due to the rising concerns on privacy protection, how to build machine learning (ML) models
over different data sources with security guarantees is gaining more popularity. Vertical …

被引用次数：48 相关文章所有 3 个版本

[PDF] acm.org

Spotserve: Serving generative large language models on preemptible instances

X Miao, C Shi, J Duan, X Xi, D Lin, B Cui… - Proceedings of the 29th …, 2024 - dl.acm.org

The high computational and memory requirements of generative large language models
(LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary …

被引用次数：28 相关文章所有 4 个版本

[PDF] arxiv.org

Galvatron: Efficient transformer training over multiple gpus using automatic parallelism

X Miao, Y Wang, Y Jiang, C Shi, X Nie, H Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org

Transformer models have achieved state-of-the-art performance on various domains of
applications and gradually becomes the foundations of the advanced large deep learning …

被引用次数：39 相关文章所有 5 个版本

[PDF] arxiv.org

HET: scaling out huge embedding model training via cache-enabled distributed framework

X Miao, H Zhang, Y Shi, X Nie, Z Yang, Y Tao… - arXiv preprint arXiv …, 2021 - arxiv.org

Embedding models have been an effective learning paradigm for high-dimensional data.
However, one open issue of embedding models is that their representations (latent factors) …

被引用次数：46 相关文章所有 6 个版本

[PDF] arxiv.org

Flash-llm: Enabling cost-effective and highly-efficient large generative model inference with unstructured sparsity

H Xia, Z Zheng, Y Li, D Zhuang, Z Zhou, X Qiu… - arXiv preprint arXiv …, 2023 - arxiv.org

With the fast growth of parameter size, it becomes increasingly challenging to deploy large
generative models as they typically require large GPU memory consumption and massive …

被引用次数：20 相关文章所有 5 个版本

[PDF] arxiv.org

Dear: Accelerating distributed deep learning with fine-grained all-reduce pipelining

L Zhang, S Shi, X Chu, W Wang, B Li… - 2023 IEEE 43rd …, 2023 - ieeexplore.ieee.org

Communication scheduling has been shown to be effective in accelerating distributed
training, which enables all-reduce communications to be overlapped with backpropagation …

被引用次数：13 相关文章所有 8 个版本

[PDF] vldb.org

Sdpipe: A semi-decentralized framework for heterogeneity-aware pipeline-parallel training

X Miao, Y Shi, Z Yang, B Cui, Z Jia - Proceedings of the VLDB …, 2023 - dl.acm.org

The increasing size of both deep learning models and training data necessitates the ability
to scale out model training through pipeline-parallel training, which combines pipelined …

被引用次数：9 相关文章所有 2 个版本

[PDF] archive.org

HET-GMP: A graph-based system approach to scaling large embedding model training

X Miao, Y Shi, H Zhang, X Zhang, X Nie… - Proceedings of the …, 2022 - dl.acm.org

Embedding models have been recognized as an effective learning paradigm for high-
dimensional data. However, a major embedding model training obstacle is that updating …

被引用次数：16 相关文章所有 2 个版本

[PDF] github.io

Bladedisc: Optimizing dynamic shape machine learning workloads via compiler approach

Z Zheng, Z Pan, D Wang, K Zhu, W Zhao… - Proceedings of the …, 2023 - dl.acm.org

Compiler optimization plays an increasingly important role to boost the performance of
machine learning models for data processing and management. With increasingly complex …

被引用次数：7 相关文章所有 4 个版本