Human-like controllable image captioning with verb-specific semantic roles

L Chen, Z Jiang, J Xiao, W Liu - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Abstract Controllable Image Captioning (CIC)--generating image descriptions following
designated control signals--has received unprecedented attention over the last few years …

Situation recognition with graph neural networks

R Li, M Tapaswi, R Liao, J Jia… - Proceedings of the …, 2017 - openaccess.thecvf.com
We address the problem of recognizing situations in images. Given an image, the task is to
predict the most salient verb (action), and fill its semantic roles such as who is performing the …

Grounding'grounding'in NLP

KR Chandu, Y Bisk, AW Black - arXiv preprint arXiv:2106.02192, 2021 - arxiv.org
The NLP community has seen substantial recent interest in grounding to facilitate interaction
between language technologies and the world. However, as a community, we use the term …

[PDF][PDF] Language to Action: Towards Interactive Task Learning with Physical Agents.

JY Chai, Q Gao, L She, S Yang, S Saba-Sadiya, G Xu - IJCAI, 2018 - researchgate.net
Abstract Language communication plays an important role in human learning and
knowledge acquisition. With the emergence of a new generation of cognitive robots …

Finding" it": Weakly-supervised reference-aware visual grounding in instructional videos

DA Huang, S Buch, L Dery, A Garg… - Proceedings of the …, 2018 - openaccess.thecvf.com
Grounding textual phrases in visual content with standalone image-sentence pairs is a
challenging task. When we consider grounding in instructional videos, this problem …

Efficient grounding of abstract spatial concepts for natural language interaction with robot platforms

R Paul, J Arkin, D Aksaray, N Roy… - … Journal of Robotics …, 2018 - journals.sagepub.com
Our goal is to develop models that allow a robot to efficiently understand or “ground” natural
language instructions in the context of its world representation. Contemporary approaches …

Interactive learning of grounded verb semantics towards human-robot communication

L She, J Chai - Proceedings of the 55th Annual Meeting of the …, 2017 - aclanthology.org
To enable human-robot communication and collaboration, previous works represent
grounded verb semantics as the potential change of state to the physical world caused by …

Collaborative language grounding toward situated human-robot dialogue

JY Chai, R Fang, C Liu, L She - ai Magazine, 2016 - ojs.aaai.org
To enable situated human-robot dialogue, techniques to support grounded language
communication are essential. One particular challenge is to ground human language to …

Rethinking the two-stage framework for grounded situation recognition

M Wei, L Chen, W Ji, X Yue, TS Chua - Proceedings of the AAAI …, 2022 - ojs.aaai.org
Abstract Grounded Situation Recognition (GSR), ie, recognizing the salient activity (or verb)
category in an image (eg, buying) and detecting all corresponding semantic roles (eg, agent …

Unsupervised visual-linguistic reference resolution in instructional videos

DA Huang, JJ Lim, L Fei-Fei… - Proceedings of the …, 2017 - openaccess.thecvf.com
We propose an unsupervised method for reference resolution in instructional videos, where
the goal is to temporally link an entity (eg," dressing") to the action (eg," mix yogurt") that …