Semiotic schemas: A framework for grounding language in action and perception

D Roy - Artificial Intelligence, 2005 - Elsevier
A theoretical framework for grounding language is introduced that provides a computational
path from sensing and motor action to words and speech acts. The approach combines …

Grounded semantic composition for visual scenes

P Gorniak, D Roy - Journal of Artificial Intelligence Research, 2004 - jair.org
We present a visually-grounded language understanding model based on a study of how
people verbally describe objects in scenes. The emphasis of the model is on the …

Mental imagery for a conversational robot

D Roy, KY Hsiao, N Mavridis - IEEE Transactions on Systems …, 2004 - ieeexplore.ieee.org
To build robots that engage in fluid face-to-face spoken conversations with people, robots
must have ways to connect what they say to what they see. A critical aspect of how language …

Incremental natural language processing for HRI

T Brick, M Scheutz - Proceedings of the ACM/IEEE international …, 2007 - dl.acm.org
Robots that interact with humans face-to-face using natural language need to be responsive
to the way humans use language in those situations. We propose a psychologically-inspired …

Resolving references to objects in photographs using the words-as-classifiers model

D Schlangen, S Zarrieß, C Kennington - arXiv preprint arXiv:1510.02125, 2015 - arxiv.org
A common use of language is to refer to visually present objects. Modelling it in computers
requires modelling the link between language and perception. The" words as classifiers" …

Towards situated speech understanding: Visual context priming of language models

D Roy, N Mukherjee - Computer Speech & Language, 2005 - Elsevier
Fuse is a situated spoken language understanding system that uses visual context to steer
the interpretation of speech. Given a visual scene and a spoken description, the system finds …

From First Contact to Close Encounters: A developmentally deep perceptual system for a humanoid robot

PM Fitzpatrick - 2003 - dspace.mit.edu
This thesis presents a perceptual system for a humanoid robot that integrates abilities such
as object localization and recognition with the deeper developmental machinery required to …

Coupling perception and simulation: Steps towards conversational robotics

K Hsiao, N Mavridis, D Roy - Proceedings 2003 IEEE/RSJ …, 2003 - ieeexplore.ieee.org
Human cognition makes extensive use of visualization and imagination. As a first step
towards giving a robot similar abilities, we have built a robotic system that uses a …

A real-time robotic model of human reference resolution using visual constraints

M Scheutz, K Eberhard, V Andronache - Connection Science, 2004 - Taylor & Francis
Evidence from recent psycholinguistic experiments suggests that humans resolve reference
incrementally in the presence of constraining visual context. In this paper, we present and …

[PDF][PDF] A visual context-aware multimodal system for spoken language processing.

N Mukherjee, D Roy - INTERSPEECH, 2003 - Citeseer
Recent psycholinguistic experiments show that acoustic and syntactic aspects of online
speech processing are influenced by visual context through cross-modal influences. During …