Video understanding with large language models: A survey
With the burgeoning growth of online video platforms and the escalating volume of video
content, the demand for proficient video understanding tools has intensified markedly. Given …
content, the demand for proficient video understanding tools has intensified markedly. Given …
Gemini pro defeated by gpt-4v: Evidence from education
This study compared the classification performance of Gemini Pro and GPT-4V in
educational settings. Employing visual question answering (VQA) techniques, the study …
educational settings. Employing visual question answering (VQA) techniques, the study …
GPT4Vis: what can GPT-4 do for zero-shot visual recognition?
This paper does not present a novel method. Instead, it delves into an essential, yet must-
know baseline in light of the latest advancements in Generative Artificial Intelligence …
know baseline in light of the latest advancements in Generative Artificial Intelligence …
Cocot: Contrastive chain-of-thought prompting for large multimodal models with multiple image inputs
When exploring the development of Artificial General Intelligence (AGI), a critical task for
these models involves interpreting and processing information from multiple image inputs …
these models involves interpreting and processing information from multiple image inputs …
Adashield: Safeguarding multimodal large language models from structure-based attack via adaptive shield prompting
With the advent and widespread deployment of Multimodal Large Language Models
(MLLMs), the imperative to ensure their safety has become increasingly pronounced …
(MLLMs), the imperative to ensure their safety has become increasingly pronounced …
Electionsim: Massive population election simulation powered by large language model driven agents
The massive population election simulation aims to model the preferences of specific groups
in particular election scenarios. It has garnered significant attention for its potential to …
in particular election scenarios. It has garnered significant attention for its potential to …
[PDF][PDF] LLMs Meet Long Video: Advancing Long Video Question Answering with An Interactive Visual Adapter in LLMs
Long video understanding is a significant and ongoing challenge in the intersection of
multimedia and artificial intelligence. Employing large language models (LLMs) for …
multimedia and artificial intelligence. Employing large language models (LLMs) for …
GPT4Ego: unleashing the potential of pre-trained models for zero-shot egocentric action recognition
Vision-Language Models (VLMs), pre-trained on large-scale datasets, have shown
impressive performance in various visual recognition tasks. This advancement paves the …
impressive performance in various visual recognition tasks. This advancement paves the …
Representation bias in political sample simulations with large language models
This study seeks to identify and quantify biases in simulating political samples with Large
Language Models, specifically focusing on vote choice and public opinion. Using the GPT …
Language Models, specifically focusing on vote choice and public opinion. Using the GPT …
AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue
In everyday communication, humans frequently use speech and gestures to refer to specific
areas or objects, a process known as Referential Dialogue (RD). While prior studies have …
areas or objects, a process known as Referential Dialogue (RD). While prior studies have …