From google gemini to openai q*(q-star): A survey of reshaping the generative artificial intelligence (ai) research landscape

TR McIntosh, T Susnjak, T Liu, P Watters… - arXiv preprint arXiv …, 2023 - arxiv.org
This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …

A survey on scheduling techniques in computing and network convergence

S Tang, Y Yu, H Wang, G Wang, W Chen… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
The computing demand for massive applications has led to the ubiquitous deployment of
computing power. This trend results in the urgent need for higher-level computing resource …

Megablocks: Efficient sparse training with mixture-of-experts

T Gale, D Narayanan, C Young… - … of Machine Learning …, 2023 - proceedings.mlsys.org
We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs.
Our system ismotivated by the limitations of current frameworks, which restrict the dynamic …

Accelerating distributed {MoE} training and inference with lina

J Li, Y Jiang, Y Zhu, C Wang, H Xu - 2023 USENIX Annual Technical …, 2023 - usenix.org
Scaling model parameters improves model quality at the price of high computation
overhead. Sparsely activated models, usually in the form of Mixture of Experts (MoE) …

Janus: A unified distributed training framework for sparse mixture-of-experts models

J Liu, JH Wang, Y Jiang - Proceedings of the ACM SIGCOMM 2023 …, 2023 - dl.acm.org
Scaling models to large sizes to improve performance has led a trend in deep learning, and
sparsely activated Mixture-of-Expert (MoE) is a promising architecture to scale models …

A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

Pre-gated moe: An algorithm-system co-design for fast and scalable mixture-of-expert inference

R Hwang, J Wei, S Cao, C Hwang… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Large language models (LLMs) based on transformers have made significant strides in
recent years, the success of which is driven by scaling up their model size. Despite their high …

Cliqueparcel: An approach for batching llm prompts that jointly optimizes efficiency and faithfulness

J Liu, T Yang, J Neville - arXiv preprint arXiv:2402.14833, 2024 - arxiv.org
Large language models (LLMs) have become pivotal in recent research. However, during
the inference process, LLMs still require substantial resources. In this paper, we propose …

Enabling Large Dynamic Neural Network Training with Learning-based Memory Management

J Ren, D Xu, S Yang, J Zhao, Z Li… - … Symposium on High …, 2024 - ieeexplore.ieee.org
Dynamic neural network (DyNN) enables high computational efficiency and strong
representation capability. However, training DyNN can face a memory capacity problem …

ScheMoE: An Extensible Mixture-of-Experts Distributed Training System with Tasks Scheduling

S Shi, X Pan, Q Wang, C Liu, X Ren, Z Hu… - Proceedings of the …, 2024 - dl.acm.org
In recent years, large-scale models can be easily scaled to trillions of parameters with
sparsely activated mixture-of-experts (MoE), which significantly improves the model quality …