A trainable spoken language understanding system for visual object selection.

D Roy - Artificial Intelligence, 2005 - Elsevier

A theoretical framework for grounding language is introduced that provides a computational
path from sensing and motor action to words and speech acts. The approach combines …

被引用次数：341 相关文章所有 16 个版本

[PDF] jair.org

Grounded semantic composition for visual scenes

P Gorniak, D Roy - Journal of Artificial Intelligence Research, 2004 - jair.org

We present a visually-grounded language understanding model based on a study of how
people verbally describe objects in scenes. The emphasis of the model is on the …

被引用次数：201 相关文章所有 22 个版本

[PDF] psu.edu

Mental imagery for a conversational robot

D Roy, KY Hsiao, N Mavridis - IEEE Transactions on Systems …, 2004 - ieeexplore.ieee.org

To build robots that engage in fluid face-to-face spoken conversations with people, robots
must have ways to connect what they say to what they see. A critical aspect of how language …

被引用次数：163 相关文章所有 19 个版本

[PDF] academia.edu

Incremental natural language processing for HRI

T Brick, M Scheutz - Proceedings of the ACM/IEEE international …, 2007 - dl.acm.org

Robots that interact with humans face-to-face using natural language need to be responsive
to the way humans use language in those situations. We propose a psychologically-inspired …

被引用次数：118 相关文章所有 13 个版本

[PDF] arxiv.org

Resolving references to objects in photographs using the words-as-classifiers model

D Schlangen, S Zarrieß, C Kennington - arXiv preprint arXiv:1510.02125, 2015 - arxiv.org

A common use of language is to refer to visually present objects. Modelling it in computers
requires modelling the link between language and perception. The" words as classifiers" …

被引用次数：57 相关文章所有 9 个版本

[PDF] mit.edu

Towards situated speech understanding: Visual context priming of language models

D Roy, N Mukherjee - Computer Speech & Language, 2005 - Elsevier

Fuse is a situated spoken language understanding system that uses visual context to steer
the interpretation of speech. Given a visual scene and a spoken description, the system finds …

被引用次数：92 相关文章所有 9 个版本

[PDF] mit.edu

From First Contact to Close Encounters: A developmentally deep perceptual system for a humanoid robot

PM Fitzpatrick - 2003 - dspace.mit.edu

This thesis presents a perceptual system for a humanoid robot that integrates abilities such
as object localization and recognition with the deeper developmental machinery required to …

被引用次数：74 相关文章所有 10 个版本

[PDF] psu.edu

Coupling perception and simulation: Steps towards conversational robotics

K Hsiao, N Mavridis, D Roy - Proceedings 2003 IEEE/RSJ …, 2003 - ieeexplore.ieee.org

Human cognition makes extensive use of visualization and imagination. As a first step
towards giving a robot similar abilities, we have built a robotic system that uses a …

被引用次数：51 相关文章所有 14 个版本

[PDF] tandfonline.com Full View

A real-time robotic model of human reference resolution using visual constraints

M Scheutz, K Eberhard, V Andronache - Connection Science, 2004 - Taylor & Francis

Evidence from recent psycholinguistic experiments suggests that humans resolve reference
incrementally in the presence of constraining visual context. In this paper, we present and …

被引用次数：35 相关文章所有 11 个版本

[PDF] psu.edu

[PDF][PDF] A visual context-aware multimodal system for spoken language processing.

N Mukherjee, D Roy - INTERSPEECH, 2003 - Citeseer

Recent psycholinguistic experiments show that acoustic and syntactic aspects of online
speech processing are influenced by visual context through cross-modal influences. During …

被引用次数：18 相关文章所有 7 个版本