Gpt-4v (ision) as a social media analysis engine

Y Tang, J Bi, S Xu, L Song, S Liang, T Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

With the burgeoning growth of online video platforms and the escalating volume of video
content, the demand for proficient video understanding tools has intensified markedly. Given …

被引用次数：46 相关文章所有 2 个版本

[PDF] arxiv.org

Gemini pro defeated by gpt-4v: Evidence from education

GG Lee, E Latif, L Shi, X Zhai - arXiv preprint arXiv:2401.08660, 2023 - arxiv.org

This study compared the classification performance of Gemini Pro and GPT-4V in
educational settings. Employing visual question answering (VQA) techniques, the study …

被引用次数：31 相关文章所有 3 个版本

[PDF] arxiv.org

GPT4Vis: what can GPT-4 do for zero-shot visual recognition?

W Wu, H Yao, M Zhang, Y Song, W Ouyang… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper does not present a novel method. Instead, it delves into an essential, yet must-
know baseline in light of the latest advancements in Generative Artificial Intelligence …

被引用次数：22 相关文章所有 2 个版本

[PDF] arxiv.org

Cocot: Contrastive chain-of-thought prompting for large multimodal models with multiple image inputs

D Zhang, J Yang, H Lyu, Z Jin, Y Yao, M Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

When exploring the development of Artificial General Intelligence (AGI), a critical task for
these models involves interpreting and processing information from multiple image inputs …

被引用次数：25 相关文章所有 2 个版本

[PDF] arxiv.org

Adashield: Safeguarding multimodal large language models from structure-based attack via adaptive shield prompting

Y Wang, X Liu, Y Li, M Chen, C Xiao - arXiv preprint arXiv:2403.09513, 2024 - arxiv.org

With the advent and widespread deployment of Multimodal Large Language Models
(MLLMs), the imperative to ensure their safety has become increasingly pronounced …

被引用次数：25 相关文章所有 2 个版本

[PDF] arxiv.org

Electionsim: Massive population election simulation powered by large language model driven agents

X Zhang, J Lin, L Sun, W Qi, Y Yang, Y Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

The massive population election simulation aims to model the preferences of specific groups
in particular election scenarios. It has garnered significant attention for its potential to …

被引用次数：4 相关文章所有 2 个版本

[PDF] researchgate.net

[PDF][PDF] LLMs Meet Long Video: Advancing Long Video Question Answering with An Interactive Visual Adapter in LLMs

Y Li, X Chen, B Hu, M Zhang - arXiv preprint arXiv:2402.13546, 2024 - researchgate.net

Long video understanding is a significant and ongoing challenge in the intersection of
multimedia and artificial intelligence. Employing large language models (LLMs) for …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

GPT4Ego: unleashing the potential of pre-trained models for zero-shot egocentric action recognition

G Dai, X Shu, W Wu, R Yan, J Zhang - arXiv preprint arXiv:2401.10039, 2024 - arxiv.org

Vision-Language Models (VLMs), pre-trained on large-scale datasets, have shown
impressive performance in various visual recognition tasks. This advancement paves the …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Representation bias in political sample simulations with large language models

W Qi, H Lyu, J Luo - arXiv preprint arXiv:2407.11409, 2024 - arxiv.org

This study seeks to identify and quantify biases in simulating political samples with Large
Language Models, specifically focusing on vote choice and public opinion. Using the GPT …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue

Y Tang, D Shimada, J Bi, C Xu - arXiv preprint arXiv:2403.16276, 2024 - arxiv.org

In everyday communication, humans frequently use speech and gestures to refer to specific
areas or objects, a process known as Referential Dialogue (RD). While prior studies have …

被引用次数：17 相关文章所有 2 个版本