Machine translation from signed to spoken languages: State of the art and challenges

M De Coster, D Shterionov, M Van Herreweghe… - Universal Access in the …, 2024 - Springer
Automatic translation from signed to spoken languages is an interdisciplinary research
domain on the intersection of computer vision, machine translation (MT), and linguistics …

Gloss attention for gloss-free sign language translation

A Yin, T Zhong, L Tang, W Jin… - Proceedings of the …, 2023 - openaccess.thecvf.com
Most sign language translation (SLT) methods to date require the use of gloss annotations to
provide additional supervision information, however, the acquisition of gloss is not easy. To …

Exploring group video captioning with efficient relational approximation

W Lin, T Jin, Y Wang, W Pan, L Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Current video captioning efforts most focus on describing a single video while the need for
captioning videos in groups has increased considerably. In this study, we propose a new …

From rule-based models to deep learning transformers architectures for natural language processing and sign language translation systems: survey, taxonomy and …

N Shahin, L Ismail - Artificial Intelligence Review, 2024 - Springer
With the growing Deaf and Hard of Hearing population worldwide and the persistent
shortage of certified sign language interpreters, there is a pressing need for an efficient …

Multi-granularity relational attention network for audio-visual question answering

L Li, T Jin, W Lin, H Jiang, W Pan… - … on Circuits and …, 2023 - ieeexplore.ieee.org
Recent methods for video question answering (VideoQA), aiming to generate answers
based on given questions and video content, have made significant progress in cross-modal …

Boosting Speech Recognition Robustness to Modality-Distortion with Contrast-Augmented Prompts

D Fu, X Cheng, X Yang, W Hanting, Z Zhao… - Proceedings of the 32nd …, 2024 - dl.acm.org
In the burgeoning field of Audio-Visual Speech Recognition (AVSR), extant research has
predominantly concentrated on the training paradigms tailored for high-quality resources …

Contrastive token-wise meta-learning for unseen performer visual temporal-aligned translation

L Li, T Jin, X Cheng, Y Wang, W Lin… - Findings of the …, 2023 - aclanthology.org
Visual temporal-aligned translation aims to transform the visual sequence into natural
words, including important applicable tasks such as lipreading and fingerspelling …

Opensr: Open-modality speech recognition via maintaining multi-modality alignment

X Cheng, T Jin, L Li, W Lin, X Duan, Z Zhao - arXiv preprint arXiv …, 2023 - arxiv.org
Speech Recognition builds a bridge between the multimedia streaming (audio-only, visual-
only or audio-visual) and the corresponding text transcription. However, when training the …

Rethinking Missing Modality Learning from a Decoding Perspective

T Jin, X Cheng, L Li, W Lin, Y Wang… - Proceedings of the 31st …, 2023 - dl.acm.org
Conventional pipeline of multimodal learning consists of three stages, including encoding,
fusion, and decoding. Most existing methods under missing modality condition focus on the …

ASLRing: American Sign Language Recognition with Meta-Learning on Wearables

H Zhou, T Lu, K DeHaan… - 2024 IEEE/ACM Ninth …, 2024 - ieeexplore.ieee.org
Sign Language is widely used by over 500 million Deaf and hard of hearing (DHH)
individuals in their daily lives. While prior works made notable efforts to show the feasibility …