A review of multimodal emotion recognition from datasets, preprocessing, features, and fusion methods
B Pan, K Hirota, Z Jia, Y Dai - Neurocomputing, 2023 - Elsevier
Affective computing is one of the most important research fields in modern human–computer
interaction (HCI). The goal of affective computing is to study and develop the theories …
interaction (HCI). The goal of affective computing is to study and develop the theories …
Transzero: Attribute-guided transformer for zero-shot learning
Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic
knowledge from seen classes to unseen ones. Semantic knowledge is learned from attribute …
knowledge from seen classes to unseen ones. Semantic knowledge is learned from attribute …
Heterogeneous semantic transfer for multi-label recognition with partial labels
Multi-label image recognition with partial labels (MLR-PL), in which some labels are known
while others are unknown for each image, may greatly reduce the cost of annotation and …
while others are unknown for each image, may greatly reduce the cost of annotation and …
Semantic-aware representation blending for multi-label image recognition with partial labels
Training the multi-label image recognition models with partial labels, in which merely some
labels are known while others are unknown for each image, is a considerably challenging …
labels are known while others are unknown for each image, is a considerably challenging …
Magic: Multimodal relational graph adversarial inference for diverse and unpaired text-based image captioning
Text-based image captioning (TextCap) requires simultaneous comprehension of visual
content and reading the text of images to generate a natural language description. Although …
content and reading the text of images to generate a natural language description. Although …
Understanding self-attention mechanism via dynamical system perspective
The self-attention mechanism (SAM) is widely used in various fields of artificial intelligence
and has successfully boosted the performance of different models. However, current …
and has successfully boosted the performance of different models. However, current …
FG-AGR: Fine-grained associative graph representation for facial expression recognition in the wild
C Li, X Li, X Wang, D Huang, Z Liu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Facial expression recognition (FER) in the wild is challenging due to various unconstrained
conditions, ie, occlusions and head pose variations. Previous methods tend to improve the …
conditions, ie, occlusions and head pose variations. Previous methods tend to improve the …
Spatial-temporal knowledge-embedded transformer for video scene graph generation
Video scene graph generation (VidSGG) aims to identify objects in visual scenes and infer
their relationships for a given video. It requires not only a comprehensive understanding of …
their relationships for a given video. It requires not only a comprehensive understanding of …
Multi-stage spatio-temporal aggregation transformer for video person re-identification
In recent years, the Transformer architecture has shown its superiority in the video-based
person re-identification task. Inspired by video representation learning, these methods …
person re-identification task. Inspired by video representation learning, these methods …
RestoreFormer++: Towards real-world blind face restoration from undegraded key-value pairs
Blind face restoration aims at recovering high-quality face images from those with unknown
degradations. Current algorithms mainly introduce priors to complement high-quality details …
degradations. Current algorithms mainly introduce priors to complement high-quality details …