A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

Retrieval-augmented mixture of lora experts for uploadable machine learning

Z Zhao, L Gan, G Wang, Y Hu, T Shen, H Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
Low-Rank Adaptation (LoRA) offers an efficient way to fine-tune large language models
(LLMs). Its modular and plug-and-play nature allows the integration of various domain …

Towards modular llms by building and reusing a library of loras

O Ostapenko, Z Su, EM Ponti, L Charlin… - arXiv preprint arXiv …, 2024 - arxiv.org
The growing number of parameter-efficient adaptations of a base large language model
(LLM) calls for studying whether we can reuse such trained adapters to improve …

Mixture of Experts Using Tensor Products

Z Su, F Mo, P Tiwari, B Wang, JY Nie… - arXiv preprint arXiv …, 2024 - arxiv.org
In multi-task learning, the conventional approach involves training a model on multiple tasks
simultaneously. However, the training signals from different tasks can interfere with one …

[PDF][PDF] Information Propagation in Modular Language Modeling and Web Tracking

Z Su - 2024 - di.ku.dk
Abstract Information propagation is the process through which data are transmitted within a
system. The growth of large-scale web datasets has led to explosive growth in information …