Openmoe: An early effort on open mixture-of-experts language models
To help the open-source community have a better understanding of Mixture-of-Experts
(MoE) based large language models (LLMs), we train and release OpenMoE, a series of …
(MoE) based large language models (LLMs), we train and release OpenMoE, a series of …
Fusemoe: Mixture-of-experts transformers for fleximodal fusion
As machine learning models in critical fields increasingly grapple with multimodal data, they
face the dual challenges of handling a wide array of modalities, often incomplete due to …
face the dual challenges of handling a wide array of modalities, often incomplete due to …
Statistical perspective of top-k sparse softmax gating mixture of experts
Top-K sparse softmax gating mixture of experts has been widely used for scaling up massive
deep-learning architectures without increasing the computational cost. Despite its popularity …
deep-learning architectures without increasing the computational cost. Despite its popularity …
Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?
Dense-to-sparse gating mixture of experts (MoE) has recently become an effective
alternative to a well-known sparse MoE. Rather than fixing the number of activated experts …
alternative to a well-known sparse MoE. Rather than fixing the number of activated experts …
On least squares estimation in softmax gating mixture of experts
Mixture of experts (MoE) model is a statistical machine learning design that aggregates
multiple expert networks using a softmax gating function in order to form a more intricate and …
multiple expert networks using a softmax gating function in order to form a more intricate and …
Model compression and efficient inference for large language models: A survey
Transformer based large language models have achieved tremendous success. However,
the significant memory and computational costs incurred during the inference process make …
the significant memory and computational costs incurred during the inference process make …
A general theory for softmax gating multinomial logistic mixture of experts
Mixture-of-experts (MoE) model incorporates the power of multiple submodels via gating
functions to achieve greater performance in numerous regression and classification …
functions to achieve greater performance in numerous regression and classification …
Newtonian Physics Informed Neural Network (NwPiNN) for Spatio-Temporal Forecast of Visual Data
Abstract Machine intelligence is at great height these days and has been evident with its
effective provenance in almost all domains of science and technology. This work will focus …
effective provenance in almost all domains of science and technology. This work will focus …
A multimodal vision transformer for interpretable fusion of functional and structural neuroimaging data
Deep learning models, despite their potential for increasing our understanding of intricate
neuroimaging data, can be hampered by challenges related to interpretability. Multimodal …
neuroimaging data, can be hampered by challenges related to interpretability. Multimodal …
Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts
The softmax gating function is arguably the most popular choice in mixture of experts
modeling. Despite its widespread use in practice, softmax gating may lead to unnecessary …
modeling. Despite its widespread use in practice, softmax gating may lead to unnecessary …