Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models

W Wu, X Wang, H Luo, J Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision-language models (VLMs) pre-trained on large-scale image-text pairs have
demonstrated impressive transferability on various visual tasks. Transferring knowledge …

Revisiting classifier: Transferring vision-language models for video recognition

W Wu, Z Sun, W Ouyang - Proceedings of the AAAI conference on …, 2023 - ojs.aaai.org
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is
an important topic in computer vision research. Along with the growth of computational …

Delving into the local: Dynamic inconsistency learning for deepfake video detection

Z Gu, Y Chen, T Yao, S Ding, J Li, L Ma - Proceedings of the AAAI …, 2022 - ojs.aaai.org
The rapid development of facial manipulation techniques has aroused public concerns in
recent years. Existing deepfake video detection approaches attempt to capture the discrim …

Transferring vision-language models for visual recognition: A classifier perspective

W Wu, Z Sun, Y Song, J Wang, W Ouyang - International Journal of …, 2024 - Springer
Transferring knowledge from pre-trained deep models for downstream tasks, particularly
with limited labeled samples, is a fundamental problem in computer vision research. Recent …

Hierarchical contrastive inconsistency learning for deepfake video detection

Z Gu, T Yao, Y Chen, S Ding, L Ma - European Conference on Computer …, 2022 - Springer
With the rapid development of Deepfake techniques, the capacity of generating hyper-
realistic faces has aroused public concerns in recent years. The temporal inconsistency …

Ascnet: Self-supervised video representation learning with appearance-speed consistency

D Huang, W Wu, W Hu, X Liu, D He… - Proceedings of the …, 2021 - openaccess.thecvf.com
We study self-supervised video representation learning, which is a challenging task due to
1) sufficient labels for supervision; 2) unstructured and noisy visual information. Existing …

What Can Simple Arithmetic Operations Do for Temporal Modeling?

W Wu, Y Song, Z Sun, J Wang, C Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Temporal modeling plays a crucial role in understanding video content. To tackle this
problem, previous studies built complicated temporal relations through time sequence …

Nsnet: Non-saliency suppression sampler for efficient video recognition

B Xia, W Wu, H Wang, R Su, D He, H Yang… - … on Computer Vision, 2022 - Springer
It is challenging for artificial intelligence systems to achieve accurate video recognition
under the scenario of low computation costs. Adaptive inference based efficient video …

Multi-scale adaptive network for single image denoising

Y Gou, P Hu, J Lv, JT Zhou… - Advances in Neural …, 2022 - proceedings.neurips.cc
Multi-scale architectures have shown effectiveness in a variety of tasks thanks to appealing
cross-scale complementarity. However, existing architectures treat different scale features …

Temporal action proposal generation with background constraint

H Yang, W Wu, L Wang, S Jin, B Xia, H Yao… - Proceedings of the …, 2022 - ojs.aaai.org
Temporal action proposal generation (TAPG) is a challenging task that aims to locate action
instances in untrimmed videos with temporal boundaries. To evaluate the confidence of …