Reviewing 25 years of continuous sign language recognition research: Advances, challenges, and prospects
Sign language is a form of visual communication employing hand gestures, body
movements, and facial expressions. The growing prevalence of hearing impairment has …
movements, and facial expressions. The growing prevalence of hearing impairment has …
Generative bias for robust visual question answering
Abstract The task of Visual Question Answering (VQA) is known to be plagued by the issue
of VQA models exploiting biases within the dataset to make its final prediction. Various …
of VQA models exploiting biases within the dataset to make its final prediction. Various …
Semi-supervised image captioning by adversarially propagating labeled data
We present a novel data-efficient semi-supervised framework to improve the generalization
of image captioning models. Constructing a large-scale labeled image captioning dataset is …
of image captioning models. Constructing a large-scale labeled image captioning dataset is …
Slowfast Network for Continuous Sign Language Recognition
The objective of this work is the effective extraction of spatial and dynamic features for
Continuous Sign Language Recognition (CSLR). To accomplish this, we utilise a two …
Continuous Sign Language Recognition (CSLR). To accomplish this, we utilise a two …
Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
Video Temporal Grounding (VTG) aims to identify visual frames in a video clip that match
text queries. Recent studies in VTG employ cross-attention to correlate visual frames and …
text queries. Recent studies in VTG employ cross-attention to correlate visual frames and …
Swin-MSTP: Swin transformer with multi-scale temporal perception for continuous sign language recognition
Continuous sign language recognition (CSLR) aims to recognize and interpret sequences of
sign language gestures in videos. Currently, most CSLR frameworks combine spatial feature …
sign language gestures in videos. Currently, most CSLR frameworks combine spatial feature …
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
In this paper, we propose a new method to enhance compositional understanding in pre-
trained vision and language models (VLMs) without sacrificing performance in zero-shot …
trained vision and language models (VLMs) without sacrificing performance in zero-shot …
Sign language translation with hierarchical memorized context in question answering scenarios
L Gao, W Feng, P Shi, R Han, D Lin, L Wan - Neural Computing and …, 2024 - Springer
Vision-based sign language translation (SLT) targets to translate sign language videos into
understandable natural language sentences. Current SLT methods ignore the utilization of …
understandable natural language sentences. Current SLT methods ignore the utilization of …
Modeling semantic correlation and hierarchy for real-world wildlife recognition
We explore the challenges of human-in-the-loop frameworks to label wildlife recognition
datasets with a neural network. In wildlife imagery, the main challenges for a model to assist …
datasets with a neural network. In wildlife imagery, the main challenges for a model to assist …
A Comparative Study of Continuous Sign Language Recognition Techniques
Continuous Sign Language Recognition (CSLR) focuses on the interpretation of a sequence
of sign language gestures performed continually without pauses. In this study, we conduct …
of sign language gestures performed continually without pauses. In this study, we conduct …