Visual instruction tuning with polite flamingo

D Chen, J Liu, W Dai, B Wang - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Recent research has demonstrated that the multi-task fine-tuning of multi-modal Large
Language Models (LLMs) using an assortment of annotated downstream vision-language …

Eqa-mx: Embodied question answering using multimodal expression

MM Islam, A Gladstone, R Islam… - The Twelfth International …, 2023 - openreview.net
Humans predominantly use verbal utterances and nonverbal gestures (eg, eye gaze and
pointing gestures) in their natural interactions. For instance, pointing gestures and verbal …

CAESAR: An embodied simulator for generating multimodal referring expression datasets

MM Islam, R Mirzaiee, A Gladstone… - Advances in Neural …, 2022 - proceedings.neurips.cc
Humans naturally use verbal utterances and nonverbal gestures to refer to various objects
(known as $\textit {referring expressions} $) in different interactional scenarios. As collecting …

Patron: perspective-aware multitask model for referring expression grounding using embodied multimodal cues

MM Islam, A Gladstone, T Iqbal - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Humans naturally use referring expressions with verbal utterances and nonverbal gestures
to refer to objects and events. As these referring expressions can be interpreted differently …

Harnessing the power of multi-task pretraining for ground-truth level natural language explanations

B Plüster, J Ambsdorf, L Braach, JH Lee… - arXiv preprint arXiv …, 2022 - arxiv.org
Natural language explanations promise to offer intuitively understandable explanations of a
neural network's decision process in complex vision-language tasks, as pursued in recent …

Visually Grounded Continual Language Learning with Selective Specialization

K Ahrens, L Bengtson, JH Lee, S Wermter - arXiv preprint arXiv …, 2023 - arxiv.org
A desirable trait of an artificial agent acting in the visual world is to continually learn a
sequence of language-informed tasks while striking a balance between sufficiently …

Neuro-Symbolic Spatio-Temporal Reasoning

JH Lee, M Sioutis, K Ahrens, M Alirezaie… - Compendium of …, 2023 - ebooks.iospress.nl
Abstract Knowledge about space and time is necessary to solve problems in the physical
world. Spatio-temporal knowledge, however, is required beyond interacting with the physical …

Knowing Earlier what Right Means to You: A Comprehensive VQA Dataset for Grounding Relative Directions via Multi-Task Learning

K Ahrens, M Kerzel, JH Lee, C Weber… - arXiv preprint arXiv …, 2022 - arxiv.org
Spatial reasoning poses a particular challenge for intelligent agents and is at the same time
a prerequisite for their successful interaction and communication in the physical world. One …