A review of multimodal emotion recognition from datasets, preprocessing, features, and fusion methods

B Pan, K Hirota, Z Jia, Y Dai - Neurocomputing, 2023 - Elsevier
Affective computing is one of the most important research fields in modern human–computer
interaction (HCI). The goal of affective computing is to study and develop the theories …

Transzero: Attribute-guided transformer for zero-shot learning

S Chen, Z Hong, Y Liu, GS Xie, B Sun, H Li… - Proceedings of the …, 2022 - ojs.aaai.org
Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic
knowledge from seen classes to unseen ones. Semantic knowledge is learned from attribute …

Heterogeneous semantic transfer for multi-label recognition with partial labels

T Chen, T Pu, L Liu, Y Shi, Z Yang, L Lin - International Journal of …, 2024 - Springer
Multi-label image recognition with partial labels (MLR-PL), in which some labels are known
while others are unknown for each image, may greatly reduce the cost of annotation and …

Semantic-aware representation blending for multi-label image recognition with partial labels

T Pu, T Chen, H Wu, L Lin - Proceedings of the AAAI conference on …, 2022 - ojs.aaai.org
Training the multi-label image recognition models with partial labels, in which merely some
labels are known while others are unknown for each image, is a considerably challenging …

Magic: Multimodal relational graph adversarial inference for diverse and unpaired text-based image captioning

W Zhang, H Shi, J Guo, S Zhang, Q Cai, J Li… - Proceedings of the …, 2022 - ojs.aaai.org
Text-based image captioning (TextCap) requires simultaneous comprehension of visual
content and reading the text of images to generate a natural language description. Although …

Understanding self-attention mechanism via dynamical system perspective

Z Huang, M Liang, J Qin, S Zhong… - Proceedings of the …, 2023 - openaccess.thecvf.com
The self-attention mechanism (SAM) is widely used in various fields of artificial intelligence
and has successfully boosted the performance of different models. However, current …

FG-AGR: Fine-grained associative graph representation for facial expression recognition in the wild

C Li, X Li, X Wang, D Huang, Z Liu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Facial expression recognition (FER) in the wild is challenging due to various unconstrained
conditions, ie, occlusions and head pose variations. Previous methods tend to improve the …

Spatial-temporal knowledge-embedded transformer for video scene graph generation

T Pu, T Chen, H Wu, Y Lu, L Lin - IEEE Transactions on Image …, 2023 - ieeexplore.ieee.org
Video scene graph generation (VidSGG) aims to identify objects in visual scenes and infer
their relationships for a given video. It requires not only a comprehensive understanding of …

Multi-stage spatio-temporal aggregation transformer for video person re-identification

Z Tang, R Zhang, Z Peng, J Chen… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
In recent years, the Transformer architecture has shown its superiority in the video-based
person re-identification task. Inspired by video representation learning, these methods …

RestoreFormer++: Towards real-world blind face restoration from undegraded key-value pairs

Z Wang, J Zhang, T Chen, W Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Blind face restoration aims at recovering high-quality face images from those with unknown
degradations. Current algorithms mainly introduce priors to complement high-quality details …