Semiotic schemas: A framework for grounding language in action and perception
D Roy - Artificial Intelligence, 2005 - Elsevier
A theoretical framework for grounding language is introduced that provides a computational
path from sensing and motor action to words and speech acts. The approach combines …
path from sensing and motor action to words and speech acts. The approach combines …
Grounded semantic composition for visual scenes
We present a visually-grounded language understanding model based on a study of how
people verbally describe objects in scenes. The emphasis of the model is on the …
people verbally describe objects in scenes. The emphasis of the model is on the …
Mental imagery for a conversational robot
D Roy, KY Hsiao, N Mavridis - IEEE Transactions on Systems …, 2004 - ieeexplore.ieee.org
To build robots that engage in fluid face-to-face spoken conversations with people, robots
must have ways to connect what they say to what they see. A critical aspect of how language …
must have ways to connect what they say to what they see. A critical aspect of how language …
Incremental natural language processing for HRI
Robots that interact with humans face-to-face using natural language need to be responsive
to the way humans use language in those situations. We propose a psychologically-inspired …
to the way humans use language in those situations. We propose a psychologically-inspired …
Resolving references to objects in photographs using the words-as-classifiers model
A common use of language is to refer to visually present objects. Modelling it in computers
requires modelling the link between language and perception. The" words as classifiers" …
requires modelling the link between language and perception. The" words as classifiers" …
Towards situated speech understanding: Visual context priming of language models
D Roy, N Mukherjee - Computer Speech & Language, 2005 - Elsevier
Fuse is a situated spoken language understanding system that uses visual context to steer
the interpretation of speech. Given a visual scene and a spoken description, the system finds …
the interpretation of speech. Given a visual scene and a spoken description, the system finds …
From First Contact to Close Encounters: A developmentally deep perceptual system for a humanoid robot
PM Fitzpatrick - 2003 - dspace.mit.edu
This thesis presents a perceptual system for a humanoid robot that integrates abilities such
as object localization and recognition with the deeper developmental machinery required to …
as object localization and recognition with the deeper developmental machinery required to …
Coupling perception and simulation: Steps towards conversational robotics
K Hsiao, N Mavridis, D Roy - Proceedings 2003 IEEE/RSJ …, 2003 - ieeexplore.ieee.org
Human cognition makes extensive use of visualization and imagination. As a first step
towards giving a robot similar abilities, we have built a robotic system that uses a …
towards giving a robot similar abilities, we have built a robotic system that uses a …
A real-time robotic model of human reference resolution using visual constraints
M Scheutz, K Eberhard, V Andronache - Connection Science, 2004 - Taylor & Francis
Evidence from recent psycholinguistic experiments suggests that humans resolve reference
incrementally in the presence of constraining visual context. In this paper, we present and …
incrementally in the presence of constraining visual context. In this paper, we present and …
[PDF][PDF] A visual context-aware multimodal system for spoken language processing.
N Mukherjee, D Roy - INTERSPEECH, 2003 - Citeseer
Recent psycholinguistic experiments show that acoustic and syntactic aspects of online
speech processing are influenced by visual context through cross-modal influences. During …
speech processing are influenced by visual context through cross-modal influences. During …