Brainformers: Trading simplicity for efficiency

F Xue, Z Zheng, Y Fu, J Ni, Z Zheng, W Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

To help the open-source community have a better understanding of Mixture-of-Experts
(MoE) based large language models (LLMs), we train and release OpenMoE, a series of …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Fusemoe: Mixture-of-experts transformers for fleximodal fusion

X Han, H Nguyen, C Harris, N Ho, S Saria - arXiv preprint arXiv …, 2024 - arxiv.org

As machine learning models in critical fields increasingly grapple with multimodal data, they
face the dual challenges of handling a wide array of modalities, often incomplete due to …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Statistical perspective of top-k sparse softmax gating mixture of experts

H Nguyen, P Akbarian, F Yan, N Ho - arXiv preprint arXiv:2309.13850, 2023 - arxiv.org

Top-K sparse softmax gating mixture of experts has been widely used for scaling up massive
deep-learning architectures without increasing the computational cost. Despite its popularity …

被引用次数：10 相关文章所有 3 个版本

[PDF] arxiv.org

Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?

H Nguyen, P Akbarian, N Ho - arXiv preprint arXiv:2401.13875, 2024 - arxiv.org

Dense-to-sparse gating mixture of experts (MoE) has recently become an effective
alternative to a well-known sparse MoE. Rather than fixing the number of activated experts …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

On least squares estimation in softmax gating mixture of experts

H Nguyen, N Ho, A Rinaldo - arXiv preprint arXiv:2402.02952, 2024 - arxiv.org

Mixture of experts (MoE) model is a statistical machine learning design that aggregates
multiple expert networks using a softmax gating function in order to form a more intricate and …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Model compression and efficient inference for large language models: A survey

W Wang, W Chen, Y Luo, Y Long, Z Lin… - arXiv preprint arXiv …, 2024 - arxiv.org

Transformer based large language models have achieved tremendous success. However,
the significant memory and computational costs incurred during the inference process make …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

A general theory for softmax gating multinomial logistic mixture of experts

H Nguyen, P Akbarian, TT Nguyen, N Ho - arXiv preprint arXiv:2310.14188, 2023 - arxiv.org

Mixture-of-experts (MoE) model incorporates the power of multiple submodels via gating
functions to achieve greater performance in numerous regression and classification …

被引用次数：5 相关文章所有 6 个版本

[PDF] springer.com

Newtonian Physics Informed Neural Network (NwPiNN) for Spatio-Temporal Forecast of Visual Data

A Dutta, K Lakshmanan, S Kumar… - Human-Centric Intelligent …, 2024 - Springer

Abstract Machine intelligence is at great height these days and has been evident with its
effective provenance in almost all domains of science and technology. This work will focus …

被引用次数：1 相关文章所有 2 个版本

[PDF] biorxiv.org

A multimodal vision transformer for interpretable fusion of functional and structural neuroimaging data

Y Bi, A Abrol, Z Fu, V Calhoun - bioRxiv, 2023 - biorxiv.org

Deep learning models, despite their potential for increasing our understanding of intricate
neuroimaging data, can be hampered by challenges related to interpretability. Multimodal …

被引用次数：2 相关文章

[PDF] arxiv.org

Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts

H Nguyen, N Ho, A Rinaldo - arXiv preprint arXiv:2405.13997, 2024 - arxiv.org

The softmax gating function is arguably the most popular choice in mixture of experts
modeling. Despite its widespread use in practice, softmax gating may lead to unnecessary …