SIMMC 2.0: A task-oriented dialog dataset for immersive multimodal conversations

S Kottur, S Moon, A Geramifard… - arXiv preprint arXiv …, 2021 - arxiv.org
Next generation task-oriented dialog systems need to understand conversational contexts
with their perceived surroundings, to effectively help users in the real-world multimodal …

Zero-shot dialogue state tracking via cross-task transfer

Z Lin, B Liu, A Madotto, S Moon, P Crook… - arXiv preprint arXiv …, 2021 - arxiv.org
Zero-shot transfer learning for dialogue state tracking (DST) enables us to handle a variety
of task-oriented dialogue domains without the expense of collecting in-domain data. In this …

Overview of the ninth dialog system technology challenge: Dstc9

C Gunasekara, S Kim, LF D'Haro… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
This paper introduces the Ninth Dialog System Technology Challenge (DSTC-9). This
edition of the DSTC focuses on applying end-to-end dialog technologies for four distinct …

State graph reasoning for multimodal conversational recommendation

Y Wu, L Liao, G Zhang, W Lei, G Zhao… - IEEE Transactions …, 2022 - ieeexplore.ieee.org
Conversational recommendation system (CRS) attracts increasing attention in various
application domains such as retail and travel. It offers an effective way to capture users' …

Bitod: A bilingual multi-domain dataset for task-oriented dialogue modeling

Z Lin, A Madotto, GI Winata, P Xu, F Jiang, Y Hu… - arXiv preprint arXiv …, 2021 - arxiv.org
Task-oriented dialogue (ToD) benchmarks provide an important avenue to measure
progress and develop better conversational agents. However, existing datasets for end-to …

Visual language navigation: A survey and open challenges

SM Park, YG Kim - Artificial Intelligence Review, 2023 - Springer
With the recent development of deep learning, AI models are widely used in various
domains. AI models show good performance for definite tasks such as image classification …

Multimodal conversational ai: A survey of datasets and approaches

A Sundar, L Heck - arXiv preprint arXiv:2205.06907, 2022 - arxiv.org
As humans, we experience the world with all our senses or modalities (sound, sight, touch,
smell, and taste). We use these modalities, particularly sight and touch, to convey and …

[PDF][PDF] UniMF: A Unified Framework to Incorporate Multimodal Knowledge Bases intoEnd-to-End Task-Oriented Dialogue Systems.

S Yang, R Zhang, SM Erfani, JH Lau - IJCAI, 2021 - ijcai.org
Abstract Knowledge bases (KBs) are usually essential for building practical dialogue
systems. Recently we have seen rapidly growing interest in integrating knowledge bases …

A Textual Dataset for Situated Proactive Response Selection

N Otani, J Araki, HS Kim, E Hovy - … of the 61st Annual Meeting of …, 2023 - aclanthology.org
Recent data-driven conversational models are able to return fluent, consistent, and
informative responses to many kinds of requests and utterances in task-oriented scenarios …

SIMMC-VR: A Task-oriented Multimodal Dialog Dataset with Situated and Immersive VR Streams

TL Wu, S Kottur, A Madotto, M Azab… - Proceedings of the …, 2023 - aclanthology.org
Building an AI assistant that can seamlessly converse and instruct humans, in a user-centric
situated scenario, requires several essential abilities:(1) spatial and temporal understanding …