A comprehensive overview of large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

A survey on multimodal large language models

S Yin, C Fu, S Zhao, K Li, X Sun, T Xu… - arXiv preprint arXiv …, 2023 - arxiv.org
Multimodal Large Language Model (MLLM) recently has been a new rising research
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …

Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration

F Fui-Hoon Nah, R Zheng, J Cai, K Siau… - Journal of Information …, 2023 - Taylor & Francis
Artificial intelligence (AI) has elicited much attention across disciplines and industries (Hyder
et al., 2019). AI has been defined as “a system's ability to correctly interpret external data, to …

Vila: On pre-training for visual language models

J Lin, H Yin, W Ping, P Molchanov… - Proceedings of the …, 2024 - openaccess.thecvf.com
Visual language models (VLMs) rapidly progressed with the recent success of large
language models. There have been growing efforts on visual instruction tuning to extend the …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Sharegpt4v: Improving large multi-modal models with better captions

L Chen, J Li, X Dong, P Zhang, C He, J Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
In the realm of large multi-modal models (LMMs), efficient modality alignment is crucial yet
often constrained by the scarcity of high-quality image-text data. To address this bottleneck …

Chat-univi: Unified visual representation empowers large language models with image and video understanding

P Jin, R Takanobu, W Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Large language models have demonstrated impressive universal capabilities across a wide
range of open-ended tasks and have extended their utility to encompass multimodal …

Internlm-xcomposer: A vision-language large model for advanced text-image comprehension and composition

P Zhang, XDB Wang, Y Cao, C Xu, L Ouyang… - arXiv preprint arXiv …, 2023 - arxiv.org
We propose InternLM-XComposer, a vision-language large model that enables advanced
image-text comprehension and composition. The innovative nature of our model is …

How Robust is Google's Bard to Adversarial Image Attacks?

Y Dong, H Chen, J Chen, Z Fang, X Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
Multimodal Large Language Models (MLLMs) that integrate text and other modalities
(especially vision) have achieved unprecedented performance in various multimodal tasks …

The (r) evolution of multimodal large language models: A survey

D Caffagni, F Cocchi, L Barsellotti, N Moratelli… - arXiv preprint arXiv …, 2024 - arxiv.org
Connecting text and visual modalities plays an essential role in generative intelligence. For
this reason, inspired by the success of large language models, significant research efforts …