Dsanet: Dynamic segment aggregation network for video-level representation learning

Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models

W Wu, X Wang, H Luo, J Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision-language models (VLMs) pre-trained on large-scale image-text pairs have
demonstrated impressive transferability on various visual tasks. Transferring knowledge …

被引用次数：81 相关文章所有 6 个版本

[PDF] aaai.org

Revisiting classifier: Transferring vision-language models for video recognition

W Wu, Z Sun, W Ouyang - Proceedings of the AAAI conference on …, 2023 - ojs.aaai.org

Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is
an important topic in computer vision research. Along with the growth of computational …

被引用次数：95 相关文章所有 4 个版本

[PDF] aaai.org

Delving into the local: Dynamic inconsistency learning for deepfake video detection

Z Gu, Y Chen, T Yao, S Ding, J Li, L Ma - Proceedings of the AAAI …, 2022 - ojs.aaai.org

The rapid development of facial manipulation techniques has aroused public concerns in
recent years. Existing deepfake video detection approaches attempt to capture the discrim …

被引用次数：86 相关文章所有 4 个版本

[PDF] springer.com

Transferring vision-language models for visual recognition: A classifier perspective

W Wu, Z Sun, Y Song, J Wang, W Ouyang - International Journal of …, 2024 - Springer

Transferring knowledge from pre-trained deep models for downstream tasks, particularly
with limited labeled samples, is a fundamental problem in computer vision research. Recent …

被引用次数：18 相关文章所有 3 个版本

[PDF] ecva.net

Hierarchical contrastive inconsistency learning for deepfake video detection

Z Gu, T Yao, Y Chen, S Ding, L Ma - European Conference on Computer …, 2022 - Springer

With the rapid development of Deepfake techniques, the capacity of generating hyper-
realistic faces has aroused public concerns in recent years. The temporal inconsistency …

被引用次数：37 相关文章所有 3 个版本

[PDF] thecvf.com

Ascnet: Self-supervised video representation learning with appearance-speed consistency

D Huang, W Wu, W Hu, X Liu, D He… - Proceedings of the …, 2021 - openaccess.thecvf.com

We study self-supervised video representation learning, which is a challenging task due to
1) sufficient labels for supervision; 2) unstructured and noisy visual information. Existing …

被引用次数：51 相关文章所有 8 个版本

[PDF] thecvf.com

What Can Simple Arithmetic Operations Do for Temporal Modeling?

W Wu, Y Song, Z Sun, J Wang, C Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Temporal modeling plays a crucial role in understanding video content. To tackle this
problem, previous studies built complicated temporal relations through time sequence …

被引用次数：9 相关文章所有 5 个版本

[PDF] arxiv.org

Nsnet: Non-saliency suppression sampler for efficient video recognition

B Xia, W Wu, H Wang, R Su, D He, H Yang… - … on Computer Vision, 2022 - Springer

It is challenging for artificial intelligence systems to achieve accurate video recognition
under the scenario of low computation costs. Adaptive inference based efficient video …

被引用次数：22 相关文章所有 5 个版本

[PDF] neurips.cc

Multi-scale adaptive network for single image denoising

Y Gou, P Hu, J Lv, JT Zhou… - Advances in Neural …, 2022 - proceedings.neurips.cc

Multi-scale architectures have shown effectiveness in a variety of tasks thanks to appealing
cross-scale complementarity. However, existing architectures treat different scale features …

被引用次数：30 相关文章所有 6 个版本

[PDF] aaai.org

Temporal action proposal generation with background constraint

H Yang, W Wu, L Wang, S Jin, B Xia, H Yao… - Proceedings of the …, 2022 - ojs.aaai.org

Temporal action proposal generation (TAPG) is a challenging task that aims to locate action
instances in untrimmed videos with temporal boundaries. To evaluate the confidence of …

被引用次数：30 相关文章所有 5 个版本