Cumo: Scaling multimodal llm with co-upcycled mixture-of-experts

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

Cumo: Scaling multimodal llm with co-upcycled mixture-of-experts

在引用文章中搜索

[PDF] arxiv.org

Parrot: Multilingual Visual Instruction Tuning

HL Sun, DW Zhou, Y Li, S Lu, C Yi, QG Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid development of Multimodal Large Language Models (MLLMs) like GPT-4V has
marked a significant step towards artificial general intelligence. Existing methods mainly …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Dense Connector for MLLMs

H Yao, W Wu, T Yang, YX Song, M Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Do we fully leverage the potential of visual encoder in Multimodal Large Language Models
(MLLMs)? The recent outstanding performance of MLLMs in multimodal understanding has …

TroL: Traversal of Layers for Large Language and Vision Models

BK Lee, S Chung, CW Kim, B Park, YM Ro - arXiv preprint arXiv …, 2024 - arxiv.org

Large language and vision models (LLVMs) have been driven by the generalization power
of large language models (LLMs) and the advent of visual instruction tuning. Along with …

CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models

J Kim, H Kim, Y Kim, YM Ro - arXiv preprint arXiv:2406.01920, 2024 - arxiv.org

Large Multi-modal Models (LMMs) have recently demonstrated remarkable abilities in visual
context understanding and coherent response generation. However, alongside these …