Improved baselines for data-efficient perceptual augmentation of llms

FB Baldassini, M Shukor, M Cord… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Large Language Models have demonstrated remarkable performance across
various tasks exhibiting the capacity to swiftly acquire new skills such as through In-Context …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Building and better understanding vision-language models: insights and future directions

H Laurençon, A Marafioti, V Sanh… - arXiv preprint arXiv …, 2024 - arxiv.org

The field of vision-language models (VLMs), which take images and texts as inputs and
output texts, is rapidly evolving and has yet to reach consensus on several key aspects of …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Z Qin, D Chen, W Zhang, L Yao, Y Huang… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid development of large language models (LLMs) has been witnessed in recent
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models

BT Corradini, M Shukor, P Couairon… - arXiv preprint arXiv …, 2024 - arxiv.org

Foundation models have exhibited unprecedented capabilities in tackling many domains
and tasks. Models such as CLIP are currently widely used to bridge cross-modal …

被引用次数：2 相关文章

[PDF] arxiv.org

A Concept-Based Explainability Framework for Large Multimodal Models

J Parekh, P Khayatan, M Shukor, A Newson… - arXiv preprint arXiv …, 2024 - arxiv.org

Large multimodal models (LMMs) combine unimodal encoders and large language models
(LLMs) to perform multimodal tasks. Despite recent advancements towards the …

被引用次数：1 相关文章所有 2 个版本