Openmoe: An early effort on open mixture-of-experts language models

F Xue, Z Zheng, Y Fu, J Ni, Z Zheng, W Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
To help the open-source community have a better understanding of Mixture-of-Experts
(MoE) based large language models (LLMs), we train and release OpenMoE, a series of …

Fusemoe: Mixture-of-experts transformers for fleximodal fusion

X Han, H Nguyen, C Harris, N Ho, S Saria - arXiv preprint arXiv …, 2024 - arxiv.org
As machine learning models in critical fields increasingly grapple with multimodal data, they
face the dual challenges of handling a wide array of modalities, often incomplete due to …

Statistical perspective of top-k sparse softmax gating mixture of experts

H Nguyen, P Akbarian, F Yan, N Ho - arXiv preprint arXiv:2309.13850, 2023 - arxiv.org
Top-K sparse softmax gating mixture of experts has been widely used for scaling up massive
deep-learning architectures without increasing the computational cost. Despite its popularity …

Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?

H Nguyen, P Akbarian, N Ho - arXiv preprint arXiv:2401.13875, 2024 - arxiv.org
Dense-to-sparse gating mixture of experts (MoE) has recently become an effective
alternative to a well-known sparse MoE. Rather than fixing the number of activated experts …

On least squares estimation in softmax gating mixture of experts

H Nguyen, N Ho, A Rinaldo - arXiv preprint arXiv:2402.02952, 2024 - arxiv.org
Mixture of experts (MoE) model is a statistical machine learning design that aggregates
multiple expert networks using a softmax gating function in order to form a more intricate and …

Model compression and efficient inference for large language models: A survey

W Wang, W Chen, Y Luo, Y Long, Z Lin… - arXiv preprint arXiv …, 2024 - arxiv.org
Transformer based large language models have achieved tremendous success. However,
the significant memory and computational costs incurred during the inference process make …

A general theory for softmax gating multinomial logistic mixture of experts

H Nguyen, P Akbarian, TT Nguyen, N Ho - arXiv preprint arXiv:2310.14188, 2023 - arxiv.org
Mixture-of-experts (MoE) model incorporates the power of multiple submodels via gating
functions to achieve greater performance in numerous regression and classification …

Newtonian Physics Informed Neural Network (NwPiNN) for Spatio-Temporal Forecast of Visual Data

A Dutta, K Lakshmanan, S Kumar… - Human-Centric Intelligent …, 2024 - Springer
Abstract Machine intelligence is at great height these days and has been evident with its
effective provenance in almost all domains of science and technology. This work will focus …

A multimodal vision transformer for interpretable fusion of functional and structural neuroimaging data

Y Bi, A Abrol, Z Fu, V Calhoun - bioRxiv, 2023 - biorxiv.org
Deep learning models, despite their potential for increasing our understanding of intricate
neuroimaging data, can be hampered by challenges related to interpretability. Multimodal …

Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts

H Nguyen, N Ho, A Rinaldo - arXiv preprint arXiv:2405.13997, 2024 - arxiv.org
The softmax gating function is arguably the most popular choice in mixture of experts
modeling. Despite its widespread use in practice, softmax gating may lead to unnecessary …