Mc-slt: Towards low-resource signer-adaptive sign language translation

M De Coster, D Shterionov, M Van Herreweghe… - Universal Access in the …, 2024 - Springer

Automatic translation from signed to spoken languages is an interdisciplinary research
domain on the intersection of computer vision, machine translation (MT), and linguistics …

被引用次数：49 相关文章所有 9 个版本

[PDF] thecvf.com

Gloss attention for gloss-free sign language translation

A Yin, T Zhong, L Tang, W Jin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Most sign language translation (SLT) methods to date require the use of gloss annotations to
provide additional supervision information, however, the acquisition of gloss is not easy. To …

被引用次数：42 相关文章所有 5 个版本

[PDF] thecvf.com

Exploring group video captioning with efficient relational approximation

W Lin, T Jin, Y Wang, W Pan, L Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Current video captioning efforts most focus on describing a single video while the need for
captioning videos in groups has increased considerably. In this study, we propose a new …

被引用次数：10 相关文章所有 3 个版本

[PDF] springer.com

From rule-based models to deep learning transformers architectures for natural language processing and sign language translation systems: survey, taxonomy and …

N Shahin, L Ismail - Artificial Intelligence Review, 2024 - Springer

With the growing Deaf and Hard of Hearing population worldwide and the persistent
shortage of certified sign language interpreters, there is a pressing need for an efficient …

被引用次数：4 相关文章所有 5 个版本

Multi-granularity relational attention network for audio-visual question answering

L Li, T Jin, W Lin, H Jiang, W Pan… - … on Circuits and …, 2023 - ieeexplore.ieee.org

Recent methods for video question answering (VideoQA), aiming to generate answers
based on given questions and video content, have made significant progress in cross-modal …

被引用次数：11 相关文章

[PDF] openreview.net

Boosting Speech Recognition Robustness to Modality-Distortion with Contrast-Augmented Prompts

D Fu, X Cheng, X Yang, W Hanting, Z Zhao… - Proceedings of the 32nd …, 2024 - dl.acm.org

In the burgeoning field of Audio-Visual Speech Recognition (AVSR), extant research has
predominantly concentrated on the training paradigms tailored for high-quality resources …

被引用次数：3 相关文章所有 2 个版本

[PDF] aclanthology.org

Contrastive token-wise meta-learning for unseen performer visual temporal-aligned translation

L Li, T Jin, X Cheng, Y Wang, W Lin… - Findings of the …, 2023 - aclanthology.org

Visual temporal-aligned translation aims to transform the visual sequence into natural
words, including important applicable tasks such as lipreading and fingerspelling …

被引用次数：6 相关文章

[PDF] arxiv.org

Opensr: Open-modality speech recognition via maintaining multi-modality alignment

X Cheng, T Jin, L Li, W Lin, X Duan, Z Zhao - arXiv preprint arXiv …, 2023 - arxiv.org

Speech Recognition builds a bridge between the multimedia streaming (audio-only, visual-
only or audio-visual) and the corresponding text transcription. However, when training the …

被引用次数：15 相关文章所有 5 个版本

Rethinking Missing Modality Learning from a Decoding Perspective

T Jin, X Cheng, L Li, W Lin, Y Wang… - Proceedings of the 31st …, 2023 - dl.acm.org

Conventional pipeline of multimodal learning consists of three stages, including encoding,
fusion, and decoding. Most existing methods under missing modality condition focus on the …

被引用次数：9 相关文章

[PDF] psu.edu

ASLRing: American Sign Language Recognition with Meta-Learning on Wearables

H Zhou, T Lu, K DeHaan… - 2024 IEEE/ACM Ninth …, 2024 - ieeexplore.ieee.org

Sign Language is widely used by over 500 million Deaf and hard of hearing (DHH)
individuals in their daily lives. While prior works made notable efforts to show the feasibility …

被引用次数：1 相关文章所有 7 个版本