Flan-moe: Scaling instruction-finetuned language models with sparse mixture of experts

HW Chung, L Hou, S Longpre, B Zoph, Y Tay… - Journal of Machine …, 2024 - jmlr.org

Finetuning language models on a collection of datasets phrased as instructions has been
shown to improve model performance and generalization to unseen tasks. In this paper we …

被引用次数：2198 相关文章所有 3 个版本

[PDF] arxiv.org

Crosslingual generalization through multitask finetuning

N Muennighoff, T Wang, L Sutawika, A Roberts… - arXiv preprint arXiv …, 2022 - arxiv.org

Multitask prompted finetuning (MTF) has been shown to help large language models
generalize to new tasks in a zero-shot setting, but so far explorations of MTF have focused …

被引用次数：478 相关文章所有 5 个版本

[PDF] mlr.press

Specializing smaller language models towards multi-step reasoning

Y Fu, H Peng, L Ou, A Sabharwal… - … on Machine Learning, 2023 - proceedings.mlr.press

The surprising ability of Large Language Models (LLMs) to perform well on complex
reasoning with only few-shot chain-of-thought prompts is believed to emerge only in very …

被引用次数：126 相关文章所有 7 个版本

[PDF] arxiv.org

Flask: Fine-grained language model evaluation based on alignment skill sets

S Ye, D Kim, S Kim, H Hwang, S Kim, Y Jo… - arXiv preprint arXiv …, 2023 - arxiv.org

Evaluation of Large Language Models (LLMs) is challenging because aligning to human
values requires the composition of multiple skills and the required set of skills varies …

被引用次数：42 相关文章所有 4 个版本

[PDF] researchgate.net

[PDF][PDF] The efficiency spectrum of large language models: An algorithmic survey

T Ding, T Chen, H Zhu, J Jiang, Y Zhong… - arXiv preprint arXiv …, 2023 - researchgate.net

The rapid growth of Large Language Models (LLMs) has been a driving force in
transforming various domains, reshaping the artificial general intelligence landscape …

被引用次数：10 相关文章所有 3 个版本

[PDF] arxiv.org

Adapting large language models for document-level machine translation

M Wu, TT Vu, L Qu, G Foster, G Haffari - arXiv preprint arXiv:2401.06468, 2024 - arxiv.org

Large language models (LLMs) have made significant strides in various natural language
processing (NLP) tasks. Recent research shows that the moderately-sized LLMs often …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

Defining a new NLP playground

S Li, C Han, P Yu, C Edwards, M Li, X Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

The recent explosion of performance of large language models (LLMs) has changed the
field of Natural Language Processing (NLP) more abruptly and seismically than any other …

被引用次数：11 相关文章所有 7 个版本

[PDF] arxiv.org

Exploring the numerical reasoning capabilities of language models: A comprehensive analysis on tabular data

M Akhtar, A Shankarampeta, V Gupta, A Patil… - arXiv preprint arXiv …, 2023 - arxiv.org

Numbers are crucial for various real-world domains such as finance, economics, and
science. Thus, understanding and reasoning with numbers are essential skills for language …

被引用次数：7 相关文章所有 5 个版本

[PDF] mlr.press

Mixture-of-Linear-Experts for Long-term Time Series Forecasting

R Ni, Z Lin, S Wang, G Fanti - International Conference on …, 2024 - proceedings.mlr.press

Long-term time series forecasting (LTSF) aims to predict future values of a time series given
the past values. The current state-of-the-art (SOTA) on this problem is attained in some …

被引用次数：4 相关文章所有 7 个版本

[PDF] arxiv.org

: Towards Building Reliable Language Models with Sparse Mixture-of-Experts

G Chen, X Zhao, T Chen, Y Cheng - arXiv preprint arXiv:2406.11353, 2024 - arxiv.org

Mixture-of-Experts (MoE) has gained increasing popularity as a promising framework for
scaling up large language models (LLMs). However, the reliability assessment of MoE lags …

被引用次数：2 相关文章所有 2 个版本