From google gemini to openai q*(q-star): A survey of reshaping the generative artificial intelligence (ai) research landscape
This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …
A survey on scheduling techniques in computing and network convergence
S Tang, Y Yu, H Wang, G Wang, W Chen… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
The computing demand for massive applications has led to the ubiquitous deployment of
computing power. This trend results in the urgent need for higher-level computing resource …
computing power. This trend results in the urgent need for higher-level computing resource …
Megablocks: Efficient sparse training with mixture-of-experts
We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs.
Our system ismotivated by the limitations of current frameworks, which restrict the dynamic …
Our system ismotivated by the limitations of current frameworks, which restrict the dynamic …
Accelerating distributed {MoE} training and inference with lina
Scaling model parameters improves model quality at the price of high computation
overhead. Sparsely activated models, usually in the form of Mixture of Experts (MoE) …
overhead. Sparsely activated models, usually in the form of Mixture of Experts (MoE) …
Janus: A unified distributed training framework for sparse mixture-of-experts models
J Liu, JH Wang, Y Jiang - Proceedings of the ACM SIGCOMM 2023 …, 2023 - dl.acm.org
Scaling models to large sizes to improve performance has led a trend in deep learning, and
sparsely activated Mixture-of-Expert (MoE) is a promising architecture to scale models …
sparsely activated Mixture-of-Expert (MoE) is a promising architecture to scale models …
Pre-gated moe: An algorithm-system co-design for fast and scalable mixture-of-expert inference
Large language models (LLMs) based on transformers have made significant strides in
recent years, the success of which is driven by scaling up their model size. Despite their high …
recent years, the success of which is driven by scaling up their model size. Despite their high …
Cliqueparcel: An approach for batching llm prompts that jointly optimizes efficiency and faithfulness
Large language models (LLMs) have become pivotal in recent research. However, during
the inference process, LLMs still require substantial resources. In this paper, we propose …
the inference process, LLMs still require substantial resources. In this paper, we propose …
Enabling Large Dynamic Neural Network Training with Learning-based Memory Management
Dynamic neural network (DyNN) enables high computational efficiency and strong
representation capability. However, training DyNN can face a memory capacity problem …
representation capability. However, training DyNN can face a memory capacity problem …
ScheMoE: An Extensible Mixture-of-Experts Distributed Training System with Tasks Scheduling
In recent years, large-scale models can be easily scaled to trillions of parameters with
sparsely activated mixture-of-experts (MoE), which significantly improves the model quality …
sparsely activated mixture-of-experts (MoE), which significantly improves the model quality …