Gesture recognition in robotic surgery: a review
B van Amsterdam, MJ Clarkson… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Objective: Surgical activity recognition is a fundamental step in computer-assisted
interventions. This paper reviews the state-of-the-art in methods for automatic recognition of …
interventions. This paper reviews the state-of-the-art in methods for automatic recognition of …
Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models
Vision-language models (VLMs) pre-trained on large-scale image-text pairs have
demonstrated impressive transferability on various visual tasks. Transferring knowledge …
demonstrated impressive transferability on various visual tasks. Transferring knowledge …
Revisiting classifier: Transferring vision-language models for video recognition
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is
an important topic in computer vision research. Along with the growth of computational …
an important topic in computer vision research. Along with the growth of computational …
Open-vclip: Transforming clip to an open-vocabulary video model via interpolated weight optimization
Abstract Contrastive Language-Image Pretraining (CLIP) has demonstrated impressive zero-
shot learning abilities for image understanding, yet limited effort has been made to …
shot learning abilities for image understanding, yet limited effort has been made to …
Cross-modal representation learning for zero-shot action recognition
We present a cross-modal Transformer-based framework, which jointly encodes video data
and text labels for zero-shot action recognition (ZSAR). Our model employs a conceptually …
and text labels for zero-shot action recognition (ZSAR). Our model employs a conceptually …
Transferring vision-language models for visual recognition: A classifier perspective
Transferring knowledge from pre-trained deep models for downstream tasks, particularly
with limited labeled samples, is a fundamental problem in computer vision research. Recent …
with limited labeled samples, is a fundamental problem in computer vision research. Recent …
Building an open-vocabulary video CLIP model with better architectures, optimization and data
Despite significant results achieved by Contrastive Language-Image Pretraining (CLIP) in
zero-shot image recognition, limited effort has been made exploring its potential for zero …
zero-shot image recognition, limited effort has been made exploring its potential for zero …
Multimodal open-vocabulary video classification via pre-trained vision and language models
Utilizing vision and language models (VLMs) pre-trained on large-scale image-text pairs is
becoming a promising paradigm for open-vocabulary visual recognition. In this work, we …
becoming a promising paradigm for open-vocabulary visual recognition. In this work, we …
Zero-shot action recognition with transformer-based video semantic embedding
While video action recognition has been an active area of research for several years, zero-
shot action recognition has only recently started gaining traction. In this work, we propose a …
shot action recognition has only recently started gaining traction. In this work, we propose a …
Alignment-uniformity aware representation learning for zero-shot video classification
Most methods tackle zero-shot video classification by aligning visual-semantic
representations within seen classes, which limits generalization to unseen classes. To …
representations within seen classes, which limits generalization to unseen classes. To …