Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models
Vision-language models (VLMs) pre-trained on large-scale image-text pairs have
demonstrated impressive transferability on various visual tasks. Transferring knowledge …
demonstrated impressive transferability on various visual tasks. Transferring knowledge …
Revisiting classifier: Transferring vision-language models for video recognition
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is
an important topic in computer vision research. Along with the growth of computational …
an important topic in computer vision research. Along with the growth of computational …
Delving into the local: Dynamic inconsistency learning for deepfake video detection
The rapid development of facial manipulation techniques has aroused public concerns in
recent years. Existing deepfake video detection approaches attempt to capture the discrim …
recent years. Existing deepfake video detection approaches attempt to capture the discrim …
Transferring vision-language models for visual recognition: A classifier perspective
Transferring knowledge from pre-trained deep models for downstream tasks, particularly
with limited labeled samples, is a fundamental problem in computer vision research. Recent …
with limited labeled samples, is a fundamental problem in computer vision research. Recent …
Hierarchical contrastive inconsistency learning for deepfake video detection
With the rapid development of Deepfake techniques, the capacity of generating hyper-
realistic faces has aroused public concerns in recent years. The temporal inconsistency …
realistic faces has aroused public concerns in recent years. The temporal inconsistency …
Ascnet: Self-supervised video representation learning with appearance-speed consistency
We study self-supervised video representation learning, which is a challenging task due to
1) sufficient labels for supervision; 2) unstructured and noisy visual information. Existing …
1) sufficient labels for supervision; 2) unstructured and noisy visual information. Existing …
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Temporal modeling plays a crucial role in understanding video content. To tackle this
problem, previous studies built complicated temporal relations through time sequence …
problem, previous studies built complicated temporal relations through time sequence …
Nsnet: Non-saliency suppression sampler for efficient video recognition
It is challenging for artificial intelligence systems to achieve accurate video recognition
under the scenario of low computation costs. Adaptive inference based efficient video …
under the scenario of low computation costs. Adaptive inference based efficient video …
Multi-scale adaptive network for single image denoising
Multi-scale architectures have shown effectiveness in a variety of tasks thanks to appealing
cross-scale complementarity. However, existing architectures treat different scale features …
cross-scale complementarity. However, existing architectures treat different scale features …
Temporal action proposal generation with background constraint
Temporal action proposal generation (TAPG) is a challenging task that aims to locate action
instances in untrimmed videos with temporal boundaries. To evaluate the confidence of …
instances in untrimmed videos with temporal boundaries. To evaluate the confidence of …