" Do you follow me?": A Survey of Recent Approaches in Dialogue State Tracking
While communicating with a user, a task-oriented dialogue system has to track the user's
needs at each turn according to the conversation history. This process called dialogue state …
needs at each turn according to the conversation history. This process called dialogue state …
Joyful: Joint modality fusion and graph contrastive learning for multimodal emotion recognition
Multimodal emotion recognition aims to recognize emotions for each utterance of multiple
modalities, which has received increasing attention for its application in human-machine …
modalities, which has received increasing attention for its application in human-machine …
Multi-modal Video Dialog State Tracking in the Wild
Abstract We present\(\mathbb {MST} _\mathbb {MIXER}\)–a novel video dialog model
operating over a generic multi-modal state tracking scheme. Current models that claim to …
operating over a generic multi-modal state tracking scheme. Current models that claim to …
VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue
Visually-grounded dialog systems, which integrate multiple modes of communication such
as text and visual inputs, have become an increasingly popular area of investigation …
as text and visual inputs, have become an increasingly popular area of investigation …
OSCaR: Object State Captioning and State Change Representation
The capability of intelligent models to extrapolate and comprehend changes in object states
is a crucial yet demanding aspect of AI research, particularly through the lens of human …
is a crucial yet demanding aspect of AI research, particularly through the lens of human …
Cascade context-oriented spatio-temporal attention network for efficient and fine-grained video-grounded dialogues
Abstract Video-Grounded Dialogue System (VGDS), focusing on generating reasonable
responses based on multi-turn dialogue contexts and a given video, has received intensive …
responses based on multi-turn dialogue contexts and a given video, has received intensive …
HERO: A Multi-modal Approach on Mobile Devices for Visual-Aware Conversational Assistance in Industrial Domains
We present HERO, an artificial assistant designed to communicate with users with both
natural language and images to aid them carrying out procedures in industrial contexts. Our …
natural language and images to aid them carrying out procedures in industrial contexts. Our …
OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog
A Abdessaied, M von Hochmeister, A Bulling - arXiv preprint arXiv …, 2024 - arxiv.org
We present the Object Language Video Transformer (OLViT)-a novel model for video dialog
operating over a multi-modal attention-based dialog state tracker. Existing video dialog …
operating over a multi-modal attention-based dialog state tracker. Existing video dialog …
Talking with Machines: A Comprehensive Survey of Emergent Dialogue Systems
W Tholke - arXiv preprint arXiv:2305.16324, 2023 - arxiv.org
From the earliest experiments in the 20th century to the utilization of large language models
and transformers, dialogue systems research has continued to evolve, playing crucial roles …
and transformers, dialogue systems research has continued to evolve, playing crucial roles …
Enhancing Augmented Reality Dialogue Systems with Multi-Modal Referential Information
In this paper, we present a novel approach to advancing augmented reality (AR) dialogue
systems, bridging the gap between two-dimensional spaces and immersive virtual …
systems, bridging the gap between two-dimensional spaces and immersive virtual …