Spatial-temporal knowledge-embedded transformer for video scene graph generation
Video scene graph generation (VidSGG) aims to identify objects in visual scenes and infer
their relationships for a given video. It requires not only a comprehensive understanding of …
their relationships for a given video. It requires not only a comprehensive understanding of …
Adaptive Global-Local Representation Learning and Selection for Cross-Domain Facial Expression Recognition
Domain shift poses a significant challenge in Cross-Domain Facial Expression Recognition
(CD-FER) due to the distribution variation between the source and target domains. Current …
(CD-FER) due to the distribution variation between the source and target domains. Current …
Category-Adaptive Label Discovery and Noise Rejection for Multi-label Recognition with Partial Positive Labels
As a cost-effective alternative to standard multi-label learning, the multi-label image
recognition with partial positive labels (MLR-PPL) task attracts increasing attention, in which …
recognition with partial positive labels (MLR-PPL) task attracts increasing attention, in which …
An attention mechanism network based on the winner-take-all
H Li, S Zhang, D Ma, W Mo - Digital Signal Processing, 2024 - Elsevier
In the field of neurology, neurons compete for brain attention in winner-take-all (WTA)
competitions. Inspired by this, we propose a new WTA-based attention network (called …
competitions. Inspired by this, we propose a new WTA-based attention network (called …
DifBFSR: Blind Face Super-Resolution via Conditional Diffusion Contraction
W Yu, Z Li, Q Liu, Y Chen, S Zhang, J Lin - Computing and Informatics, 2024 - cai.sk
Abstract Blind Face Super-Resolution (BFSR) has recently gained widespread attention,
which aims to super-resolve Low-Resolution (LR) face images with complex unknown …
which aims to super-resolve Low-Resolution (LR) face images with complex unknown …
Flair: A conditional diffusion framework with applications to face video restoration
Face video restoration (FVR) is a challenging but important problem where one seeks to
recover a perceptually realistic face videos from a low-quality input. While diffusion …
recover a perceptually realistic face videos from a low-quality input. While diffusion …