From google gemini to openai q*(q-star): A survey of reshaping the generative artificial intelligence (ai) research landscape

TR McIntosh, T Susnjak, T Liu, P Watters… - arXiv preprint arXiv …, 2023 - arxiv.org
This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …

Mod-squad: Designing mixtures of experts as modular multi-task learners

Z Chen, Y Shen, M Ding, Z Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Optimization in multi-task learning (MTL) is more challenging than single-task learning
(STL), as the gradient from different tasks can be contradictory. When tasks are related, it …

Adamv-moe: Adaptive multi-task vision mixture-of-experts

T Chen, X Chen, X Du, A Rashwan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Sparsely activated Mixture-of-Experts (MoE) is becoming a promising paradigm for
multi-task learning (MTL). Instead of compressing multiple tasks' knowledge into a single …

Taskexpert: Dynamically assembling multi-task representations with memorial mixture-of-experts

H Ye, D Xu - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
Learning discriminative task-specific features simultaneously for multiple distinct tasks is a
fundamental problem in multi-task learning. Recent state-of-the-art models consider directly …

Accelerating distributed {MoE} training and inference with lina

J Li, Y Jiang, Y Zhu, C Wang, H Xu - 2023 USENIX Annual Technical …, 2023 - usenix.org
Scaling model parameters improves model quality at the price of high computation
overhead. Sparsely activated models, usually in the form of Mixture of Experts (MoE) …

Demystifying softmax gating function in Gaussian mixture of experts

H Nguyen, TT Nguyen, N Ho - Advances in Neural …, 2023 - proceedings.neurips.cc
Understanding the parameter estimation of softmax gating Gaussian mixture of experts has
remained a long-standing open problem in the literature. It is mainly due to three …

Psychometry: An omnifit model for image reconstruction from human brain activity

R Quan, W Wang, Z Tian, F Ma… - Proceedings of the …, 2024 - openaccess.thecvf.com
Reconstructing the viewed images from human brain activity bridges human and computer
vision through the Brain-Computer Interface. The inherent variability in brain function …

Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation

R Zhang, Y Luo, J Liu, H Yang, Z Dong… - Proceedings of the …, 2024 - ojs.aaai.org
The Mixture-of-Experts (MoE) approach has demonstrated outstanding scalability in multi-
task learning including low-level upstream tasks such as concurrent removal of multiple …

Multi-Task Dense Prediction via Mixture of Low-Rank Experts

Y Yang, PT Jiang, Q Hou, H Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Previous multi-task dense prediction methods based on the Mixture of Experts (MoE) have
received great performance but they neglect the importance of explicitly modeling the global …

Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts

J Wu, X Hu, Y Wang, B Pang… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this work we present Omni-SMoLA a multimodal architecture that mixes many multi-modal
experts efficiently and achieves both high specialist and generalist performance. In contrast …