Mixture-of-experts meets instruction tuning: A winning combination for large language models

From google gemini to openai q*(q-star): A survey of reshaping the generative artificial intelligence (ai) research landscape

TR McIntosh, T Susnjak, T Liu, P Watters… - arXiv preprint arXiv …, 2023 - arxiv.org

This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …

被引用次数：70 相关文章所有 3 个版本

[PDF] researchgate.net

[PDF][PDF] Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng… - arXiv preprint arXiv …, 2023 - researchgate.net

Abstract Large Language Models (LLMs) have demonstrated remarkable capabilities in
important tasks such as natural language understanding, language generation, and …

被引用次数：58 相关文章所有 7 个版本

[PDF] arxiv.org

Scaling vision-language models with sparse mixture of experts

S Shen, Z Yao, C Li, T Darrell, K Keutzer… - arXiv preprint arXiv …, 2023 - arxiv.org

The field of natural language processing (NLP) has made significant strides in recent years,
particularly in the development of large-scale vision-language models (VLMs). These …

被引用次数：35 相关文章所有 4 个版本

[PDF] arxiv.org

Trends and challenges of real-time learning in large language models: A critical review

M Jovanovic, P Voss - arXiv preprint arXiv:2404.18311, 2024 - arxiv.org

Real-time learning concerns the ability of learning systems to acquire knowledge over time,
enabling their adaptation and generalization to novel tasks. It is a critical ability for …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Conpet: Continual parameter-efficient tuning for large language models

C Song, X Han, Z Zeng, K Li, C Chen, Z Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

Continual learning necessitates the continual adaptation of models to newly emerging tasks
while minimizing the catastrophic forgetting of old ones. This is extremely challenging for …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

Enable language models to implicitly learn self-improvement from data

Z Wang, L Hou, T Lu, Y Wu, Y Li, H Yu, H Ji - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have demonstrated remarkable capabilities in open-ended
text generation tasks. However, the inherent open-ended nature of these tasks implies that …

被引用次数：9 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese

NK Corrêa, S Falk, S Fatimah, A Sen… - Machine Learning with …, 2024 - Elsevier

Large language models (LLMs) have significantly advanced natural language processing,
but their progress has yet to be equal across languages. While most LLMs are trained in …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

B Pan, Y Shen, H Liu, M Mishra, G Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Mixture-of-Experts (MoE) language models can reduce computational costs by 2-4$\times $
compared to dense models without sacrificing performance, making them more efficient in …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Pushing The Limit of LLM Capacity for Text Classification

Y Zhang, M Wang, C Ren, Q Li, P Tiwari… - arXiv preprint arXiv …, 2024 - arxiv.org

The value of text classification's future research has encountered challenges and
uncertainties, due to the extraordinary efficacy demonstrated by large language models …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

From Automation to Augmentation: Large Language Models Elevating Essay Scoring Landscape

C Xiao, W Ma, SX Xu, K Zhang, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Receiving immediate and personalized feedback is crucial for second-language learners,
and Automated Essay Scoring (AES) systems are a vital resource when human instructors …

被引用次数：13 相关文章所有 2 个版本