Se-moe: A scalable and efficient mixture-of-experts distributed training and inference system

From google gemini to openai q*(q-star): A survey of reshaping the generative artificial intelligence (ai) research landscape

TR McIntosh, T Susnjak, T Liu, P Watters… - arXiv preprint arXiv …, 2023 - arxiv.org

This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …

被引用次数：84 相关文章所有 3 个版本

A survey on scheduling techniques in computing and network convergence

S Tang, Y Yu, H Wang, G Wang, W Chen… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org

The computing demand for massive applications has led to the ubiquitous deployment of
computing power. This trend results in the urgent need for higher-level computing resource …

被引用次数：6 相关文章所有 2 个版本

[PDF] mlsys.org

Megablocks: Efficient sparse training with mixture-of-experts

T Gale, D Narayanan, C Young… - … of Machine Learning …, 2023 - proceedings.mlsys.org

We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs.
Our system ismotivated by the limitations of current frameworks, which restrict the dynamic …

被引用次数：53 相关文章所有 4 个版本

[PDF] usenix.org

Accelerating distributed {MoE} training and inference with lina

J Li, Y Jiang, Y Zhu, C Wang, H Xu - 2023 USENIX Annual Technical …, 2023 - usenix.org

Scaling model parameters improves model quality at the price of high computation
overhead. Sparsely activated models, usually in the form of Mixture of Experts (MoE) …

被引用次数：23 相关文章所有 7 个版本

[PDF] acm.org

Janus: A unified distributed training framework for sparse mixture-of-experts models

J Liu, JH Wang, Y Jiang - Proceedings of the ACM SIGCOMM 2023 …, 2023 - dl.acm.org

Scaling models to large sizes to improve performance has led a trend in deep learning, and
sparsely activated Mixture-of-Expert (MoE) is a promising architecture to scale models …

被引用次数：12 相关文章

[PDF] arxiv.org

A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

被引用次数：7 相关文章所有 5 个版本

[PDF] arxiv.org

Pre-gated moe: An algorithm-system co-design for fast and scalable mixture-of-expert inference

R Hwang, J Wei, S Cao, C Hwang… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org

Large language models (LLMs) based on transformers have made significant strides in
recent years, the success of which is driven by scaling up their model size. Despite their high …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Cliqueparcel: An approach for batching llm prompts that jointly optimizes efficiency and faithfulness

J Liu, T Yang, J Neville - arXiv preprint arXiv:2402.14833, 2024 - arxiv.org

Large language models (LLMs) have become pivotal in recent research. However, during
the inference process, LLMs still require substantial resources. In this paper, we propose …

被引用次数：10 相关文章所有 2 个版本

[PDF] pasalabs.org

Enabling Large Dynamic Neural Network Training with Learning-based Memory Management

J Ren, D Xu, S Yang, J Zhao, Z Li… - … Symposium on High …, 2024 - ieeexplore.ieee.org

Dynamic neural network (DyNN) enables high computational efficiency and strong
representation capability. However, training DyNN can face a memory capacity problem …

被引用次数：2 相关文章所有 6 个版本

ScheMoE: An Extensible Mixture-of-Experts Distributed Training System with Tasks Scheduling

S Shi, X Pan, Q Wang, C Liu, X Ren, Z Hu… - Proceedings of the …, 2024 - dl.acm.org

In recent years, large-scale models can be easily scaled to trillions of parameters with
sparsely activated mixture-of-experts (MoE), which significantly improves the model quality …

被引用次数：6 相关文章