Eulermormer: Robust eulerian motion magnification via dynamic filtering within transformer

F Wang, D Guo, K Li, M Wang - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Video Motion Magnification (VMM) aims to break the resolution limit of human visual
perception capability and reveal the imperceptible minor motion that contains valuable …

Structure-Guided Adversarial Training of Diffusion Models

L Yang, H Qian, Z Zhang, J Liu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Diffusion models have demonstrated exceptional efficacy in various generative applications.
While existing models focus on minimizing a weighted sum of denoising score matching …

Causal reasoning in typical computer vision tasks

K Zhang, Q Sun, CQ Zhao, Y Tang - Science China Technological …, 2024 - Springer
Deep learning has revolutionized the field of artificial intelligence. Based on the statistical
correlations uncovered by deep learning-based methods, computer vision tasks, such as …

Frequency decoupling for motion magnification via multi-level isomorphic architecture

F Wang, D Guo, K Li, Z Zhong… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Video Motion Magnification (VMM) aims to reveal subtle and imperceptible motion
information of objects in the macroscopic world. Prior methods directly model the motion …

Benchmarking Micro-action Recognition: Dataset, Method, and Application

D Guo, K Li, B Hu, Y Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Micro-action is an imperceptible non-verbal behaviour characterised by low-intensity
movement. It offers insights into the feelings and intentions of individuals and is important for …

Dual-path tokenlearner for remote photoplethysmography-based physiological measurement with facial videos

W Qian, D Guo, K Li, X Zhang, X Tian… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Remote photoplethysmography (rPPG)-based physiological measurement is an emerging
yet crucial vision task, whose challenge lies in exploring accurate rPPG prediction from …

Object-aware adaptive-positivity learning for audio-visual question answering

Z Li, D Guo, J Zhou, J Zhang, M Wang - Proceedings of the AAAI …, 2024 - ojs.aaai.org
This paper focuses on the Audio-Visual Question Answering (AVQA) task that aims to
answer questions derived from untrimmed audible videos. To generate accurate answers …

Gloss-driven Conditional Diffusion Models for Sign Language Production

S Tang, F Xue, J Wu, S Wang, R Hong - ACM Transactions on …, 2024 - dl.acm.org
Sign Language Production (SLP) aims to convert text or audio sentences into sign language
videos corresponding to their semantics, which is challenging due to the diversity and …

Exploiting Diverse Feature for Multimodal Sentiment Analysis

J Li, W Qian, K Li, Q Li, D Guo, M Wang - Proceedings of the 4th on …, 2023 - dl.acm.org
In this paper, we present our solution to the MuSe-Personalisation sub-challenge in the
MuSe 2023 Multimodal Sentiment Analysis Challenge. The task of MuSe-Personalisation …

Domain generalized federated learning for Person Re-identification

F Liu, M Ye, B Du - Computer Vision and Image Understanding, 2024 - Elsevier
In the field of Person Re-identification (ReID), addressing the demands of practical
applications in diverse and uncontrollable unseen domains necessitates a focus on Domain …