From google gemini to openai q*(q-star): A survey of reshaping the generative artificial intelligence (ai) research landscape
This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …
Mod-squad: Designing mixtures of experts as modular multi-task learners
Optimization in multi-task learning (MTL) is more challenging than single-task learning
(STL), as the gradient from different tasks can be contradictory. When tasks are related, it …
(STL), as the gradient from different tasks can be contradictory. When tasks are related, it …
Adamv-moe: Adaptive multi-task vision mixture-of-experts
Abstract Sparsely activated Mixture-of-Experts (MoE) is becoming a promising paradigm for
multi-task learning (MTL). Instead of compressing multiple tasks' knowledge into a single …
multi-task learning (MTL). Instead of compressing multiple tasks' knowledge into a single …
Taskexpert: Dynamically assembling multi-task representations with memorial mixture-of-experts
Learning discriminative task-specific features simultaneously for multiple distinct tasks is a
fundamental problem in multi-task learning. Recent state-of-the-art models consider directly …
fundamental problem in multi-task learning. Recent state-of-the-art models consider directly …
Accelerating distributed {MoE} training and inference with lina
Scaling model parameters improves model quality at the price of high computation
overhead. Sparsely activated models, usually in the form of Mixture of Experts (MoE) …
overhead. Sparsely activated models, usually in the form of Mixture of Experts (MoE) …
Demystifying softmax gating function in Gaussian mixture of experts
Understanding the parameter estimation of softmax gating Gaussian mixture of experts has
remained a long-standing open problem in the literature. It is mainly due to three …
remained a long-standing open problem in the literature. It is mainly due to three …
Psychometry: An omnifit model for image reconstruction from human brain activity
Reconstructing the viewed images from human brain activity bridges human and computer
vision through the Brain-Computer Interface. The inherent variability in brain function …
vision through the Brain-Computer Interface. The inherent variability in brain function …
Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation
The Mixture-of-Experts (MoE) approach has demonstrated outstanding scalability in multi-
task learning including low-level upstream tasks such as concurrent removal of multiple …
task learning including low-level upstream tasks such as concurrent removal of multiple …
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Previous multi-task dense prediction methods based on the Mixture of Experts (MoE) have
received great performance but they neglect the importance of explicitly modeling the global …
received great performance but they neglect the importance of explicitly modeling the global …
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
In this work we present Omni-SMoLA a multimodal architecture that mixes many multi-modal
experts efficiently and achieves both high specialist and generalist performance. In contrast …
experts efficiently and achieves both high specialist and generalist performance. In contrast …