[HTML][HTML] Decoding ChatGPT: a taxonomy of existing research, current challenges, and possible future directions

SS Sohail, F Farhat, Y Himeur, M Nadeem… - Journal of King Saud …, 2023 - Elsevier
Abstract Chat Generative Pre-trained Transformer (ChatGPT) has gained significant interest
and attention since its launch in November 2022. It has shown impressive performance in …

Explainable artificial intelligence for autonomous driving: A comprehensive overview and field guide for future research directions

S Atakishiyev, M Salameh, H Yao, R Goebel - IEEE Access, 2024 - ieeexplore.ieee.org
Autonomous driving has achieved significant milestones in research and development over
the last two decades. There is increasing interest in the field as the deployment of …

Self-rewarding language models

W Yuan, RY Pang, K Cho, S Sukhbaatar, J Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
We posit that to achieve superhuman agents, future models require superhuman feedback
in order to provide an adequate training signal. Current approaches commonly train reward …

Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks

Z Chen, J Wu, W Wang, W Su, G Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
The exponential growth of large language models (LLMs) has opened up numerous
possibilities for multi-modal AGI systems. However the progress in vision and vision …

Detecting and preventing hallucinations in large vision language models

A Gunjal, J Yin, E Bas - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Instruction tuned Large Vision Language Models (LVLMs) have significantly advanced in
generalizing across a diverse set of multi-modal tasks, especially for Visual Question …

Mm1: Methods, analysis & insights from multimodal llm pre-training

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - arXiv preprint arXiv …, 2024 - arxiv.org
In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …

Clever hans or neural theory of mind? stress testing social reasoning in large language models

N Shapira, M Levy, SH Alavi, X Zhou, Y Choi… - arXiv preprint arXiv …, 2023 - arxiv.org
The escalating debate on AI's capabilities warrants developing reliable metrics to assess
machine" intelligence". Recently, many anecdotal examples were used to suggest that …

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui… - arXiv preprint arXiv …, 2024 - arxiv.org
In this report, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …

Beyond deep reinforcement learning: A tutorial on generative diffusion models in network optimization

H Du, R Zhang, Y Liu, J Wang, Y Lin, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative Diffusion Models (GDMs) have emerged as a transformative force in the realm of
Generative Artificial Intelligence (GAI), demonstrating their versatility and efficacy across a …

Dilu: A knowledge-driven approach to autonomous driving with large language models

L Wen, D Fu, X Li, X Cai, T Ma, P Cai, M Dou… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent advancements in autonomous driving have relied on data-driven approaches, which
are widely adopted but face challenges including dataset bias, overfitting, and …