Vision learners meet web image-text pairs

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

Vision learners meet web image-text pairs

在引用文章中搜索

[PDF] thecvf.com

What if the tv was off? examining counterfactual reasoning abilities of multi-modal language models

L Zhang, X Zhai, Z Zhao, Y Zong… - Proceedings of the …, 2024 - openaccess.thecvf.com

Counterfactual reasoning a fundamental aspect of human cognition involves contemplating
alternatives to established facts or past events significantly enhancing our abilities in …

被引用次数：16 相关文章所有 7 个版本

[PDF] arxiv.org

Tuning LayerNorm in Attention: Towards efficient multi-modal llm finetuning

B Zhao, H Tu, C Wei, J Mei, C Xie - arXiv preprint arXiv:2312.11420, 2023 - arxiv.org

This paper introduces an efficient strategy to transform Large Language Models (LLMs) into
Multi-Modal Large Language Models (MLLMs). By conceptualizing this transformation as a …

被引用次数：24 相关文章所有 3 个版本

[PDF] arxiv.org

Sight beyond text: Multi-modal training enhances llms in truthfulness and ethics

H Tu, B Zhao, C Wei, C Xie - arXiv preprint arXiv:2309.07120, 2023 - arxiv.org

Multi-modal large language models (MLLMs) are trained based on large language models
(LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual …

被引用次数：16 相关文章所有 3 个版本

[PDF] thecvf.com

Unsupervised camouflaged object segmentation as domain adaptation

Y Zhang, C Wu - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com

Deep learning for unsupervised image segmentation remains challenging due to the
absence of human labels. The common idea is to train a segmentation head, with the …

被引用次数：13 相关文章所有 5 个版本