Situated and interactive multimodal conversations

S Kottur, S Moon, A Geramifard… - arXiv preprint arXiv …, 2021 - arxiv.org

Next generation task-oriented dialog systems need to understand conversational contexts
with their perceived surroundings, to effectively help users in the real-world multimodal …

被引用次数：90 相关文章所有 5 个版本

[PDF] arxiv.org

Zero-shot dialogue state tracking via cross-task transfer

Z Lin, B Liu, A Madotto, S Moon, P Crook… - arXiv preprint arXiv …, 2021 - arxiv.org

Zero-shot transfer learning for dialogue state tracking (DST) enables us to handle a variety
of task-oriented dialogue domains without the expense of collecting in-domain data. In this …

被引用次数：58 相关文章所有 5 个版本

[PDF] ieee.org

Overview of the ninth dialog system technology challenge: Dstc9

C Gunasekara, S Kim, LF D'Haro… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

This paper introduces the Ninth Dialog System Technology Challenge (DSTC-9). This
edition of the DSTC focuses on applying end-to-end dialog technologies for four distinct …

被引用次数：71 相关文章所有 4 个版本

[PDF] smu.edu.sg

State graph reasoning for multimodal conversational recommendation

Y Wu, L Liao, G Zhang, W Lei, G Zhao… - IEEE Transactions …, 2022 - ieeexplore.ieee.org

Conversational recommendation system (CRS) attracts increasing attention in various
application domains such as retail and travel. It offers an effective way to capture users' …

被引用次数：39 相关文章所有 6 个版本

[PDF] arxiv.org

Bitod: A bilingual multi-domain dataset for task-oriented dialogue modeling

Z Lin, A Madotto, GI Winata, P Xu, F Jiang, Y Hu… - arXiv preprint arXiv …, 2021 - arxiv.org

Task-oriented dialogue (ToD) benchmarks provide an important avenue to measure
progress and develop better conversational agents. However, existing datasets for end-to …

被引用次数：51 相关文章所有 5 个版本

Visual language navigation: A survey and open challenges

SM Park, YG Kim - Artificial Intelligence Review, 2023 - Springer

With the recent development of deep learning, AI models are widely used in various
domains. AI models show good performance for definite tasks such as image classification …

被引用次数：24 相关文章所有 5 个版本

[PDF] arxiv.org

Multimodal conversational ai: A survey of datasets and approaches

A Sundar, L Heck - arXiv preprint arXiv:2205.06907, 2022 - arxiv.org

As humans, we experience the world with all our senses or modalities (sound, sight, touch,
smell, and taste). We use these modalities, particularly sight and touch, to convey and …

被引用次数：31 相关文章所有 6 个版本

[PDF] ijcai.org

[PDF][PDF] UniMF: A Unified Framework to Incorporate Multimodal Knowledge Bases intoEnd-to-End Task-Oriented Dialogue Systems.

S Yang, R Zhang, SM Erfani, JH Lau - IJCAI, 2021 - ijcai.org

Abstract Knowledge bases (KBs) are usually essential for building practical dialogue
systems. Recently we have seen rapidly growing interest in integrating knowledge bases …

被引用次数：21 相关文章所有 3 个版本

[PDF] aclanthology.org

A Textual Dataset for Situated Proactive Response Selection

N Otani, J Araki, HS Kim, E Hovy - … of the 61st Annual Meeting of …, 2023 - aclanthology.org

Recent data-driven conversational models are able to return fluent, consistent, and
informative responses to many kinds of requests and utterances in task-oriented scenarios …

被引用次数：2 相关文章所有 2 个版本

[PDF] aclanthology.org

SIMMC-VR: A Task-oriented Multimodal Dialog Dataset with Situated and Immersive VR Streams

TL Wu, S Kottur, A Madotto, M Azab… - Proceedings of the …, 2023 - aclanthology.org

Building an AI assistant that can seamlessly converse and instruct humans, in a user-centric
situated scenario, requires several essential abilities:(1) spatial and temporal understanding …

被引用次数：3 相关文章所有 3 个版本