ViGT: proposal-free video grounding with a learnable token in the transformer

F Wang, D Guo, K Li, M Wang - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org

Video Motion Magnification (VMM) aims to break the resolution limit of human visual
perception capability and reveal the imperceptible minor motion that contains valuable …

被引用次数：16 相关文章所有 3 个版本

[PDF] thecvf.com

Structure-Guided Adversarial Training of Diffusion Models

L Yang, H Qian, Z Zhang, J Liu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Diffusion models have demonstrated exceptional efficacy in various generative applications.
While existing models focus on minimizing a weighted sum of denoising score matching …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Causal reasoning in typical computer vision tasks

K Zhang, Q Sun, CQ Zhao, Y Tang - Science China Technological …, 2024 - Springer

Deep learning has revolutionized the field of artificial intelligence. Based on the statistical
correlations uncovered by deep learning-based methods, computer vision tasks, such as …

被引用次数：2 相关文章所有 3 个版本

[PDF] thecvf.com

Frequency decoupling for motion magnification via multi-level isomorphic architecture

F Wang, D Guo, K Li, Z Zhong… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Video Motion Magnification (VMM) aims to reveal subtle and imperceptible motion
information of objects in the macroscopic world. Prior methods directly model the motion …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Benchmarking Micro-action Recognition: Dataset, Method, and Application

D Guo, K Li, B Hu, Y Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Micro-action is an imperceptible non-verbal behaviour characterised by low-intensity
movement. It offers insights into the feelings and intentions of individuals and is important for …

被引用次数：50 相关文章所有 3 个版本

[PDF] arxiv.org

Dual-path tokenlearner for remote photoplethysmography-based physiological measurement with facial videos

W Qian, D Guo, K Li, X Zhang, X Tian… - IEEE Transactions …, 2024 - ieeexplore.ieee.org

Remote photoplethysmography (rPPG)-based physiological measurement is an emerging
yet crucial vision task, whose challenge lies in exploring accurate rPPG prediction from …

被引用次数：11 相关文章所有 4 个版本

[PDF] aaai.org

Object-aware adaptive-positivity learning for audio-visual question answering

Z Li, D Guo, J Zhou, J Zhang, M Wang - Proceedings of the AAAI …, 2024 - ojs.aaai.org

This paper focuses on the Audio-Visual Question Answering (AVQA) task that aims to
answer questions derived from untrimmed audible videos. To generate accurate answers …

被引用次数：9 相关文章所有 3 个版本

[PDF] acm.org

Gloss-driven Conditional Diffusion Models for Sign Language Production

S Tang, F Xue, J Wu, S Wang, R Hong - ACM Transactions on …, 2024 - dl.acm.org

Sign Language Production (SLP) aims to convert text or audio sentences into sign language
videos corresponding to their semantics, which is challenging due to the diversity and …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Exploiting Diverse Feature for Multimodal Sentiment Analysis

J Li, W Qian, K Li, Q Li, D Guo, M Wang - Proceedings of the 4th on …, 2023 - dl.acm.org

In this paper, we present our solution to the MuSe-Personalisation sub-challenge in the
MuSe 2023 Multimodal Sentiment Analysis Challenge. The task of MuSe-Personalisation …

被引用次数：1 相关文章所有 3 个版本

Domain generalized federated learning for Person Re-identification

F Liu, M Ye, B Du - Computer Vision and Image Understanding, 2024 - Elsevier

In the field of Person Re-identification (ReID), addressing the demands of practical
applications in diverse and uncontrollable unseen domains necessitates a focus on Domain …

被引用次数：3 相关文章