Eulermormer: Robust eulerian motion magnification via dynamic filtering within transformer
Video Motion Magnification (VMM) aims to break the resolution limit of human visual
perception capability and reveal the imperceptible minor motion that contains valuable …
perception capability and reveal the imperceptible minor motion that contains valuable …
Structure-Guided Adversarial Training of Diffusion Models
Diffusion models have demonstrated exceptional efficacy in various generative applications.
While existing models focus on minimizing a weighted sum of denoising score matching …
While existing models focus on minimizing a weighted sum of denoising score matching …
Causal reasoning in typical computer vision tasks
Deep learning has revolutionized the field of artificial intelligence. Based on the statistical
correlations uncovered by deep learning-based methods, computer vision tasks, such as …
correlations uncovered by deep learning-based methods, computer vision tasks, such as …
Frequency decoupling for motion magnification via multi-level isomorphic architecture
Abstract Video Motion Magnification (VMM) aims to reveal subtle and imperceptible motion
information of objects in the macroscopic world. Prior methods directly model the motion …
information of objects in the macroscopic world. Prior methods directly model the motion …
Benchmarking Micro-action Recognition: Dataset, Method, and Application
Micro-action is an imperceptible non-verbal behaviour characterised by low-intensity
movement. It offers insights into the feelings and intentions of individuals and is important for …
movement. It offers insights into the feelings and intentions of individuals and is important for …
Dual-path tokenlearner for remote photoplethysmography-based physiological measurement with facial videos
Remote photoplethysmography (rPPG)-based physiological measurement is an emerging
yet crucial vision task, whose challenge lies in exploring accurate rPPG prediction from …
yet crucial vision task, whose challenge lies in exploring accurate rPPG prediction from …
Object-aware adaptive-positivity learning for audio-visual question answering
This paper focuses on the Audio-Visual Question Answering (AVQA) task that aims to
answer questions derived from untrimmed audible videos. To generate accurate answers …
answer questions derived from untrimmed audible videos. To generate accurate answers …
Gloss-driven Conditional Diffusion Models for Sign Language Production
Sign Language Production (SLP) aims to convert text or audio sentences into sign language
videos corresponding to their semantics, which is challenging due to the diversity and …
videos corresponding to their semantics, which is challenging due to the diversity and …
Exploiting Diverse Feature for Multimodal Sentiment Analysis
In this paper, we present our solution to the MuSe-Personalisation sub-challenge in the
MuSe 2023 Multimodal Sentiment Analysis Challenge. The task of MuSe-Personalisation …
MuSe 2023 Multimodal Sentiment Analysis Challenge. The task of MuSe-Personalisation …
Domain generalized federated learning for Person Re-identification
In the field of Person Re-identification (ReID), addressing the demands of practical
applications in diverse and uncontrollable unseen domains necessitates a focus on Domain …
applications in diverse and uncontrollable unseen domains necessitates a focus on Domain …