Facial affective behavior analysis with instruction tuning

Y Li, A Dao, W Bao, Z Tan, T Chen, H Liu… - European Conference on …, 2025 - Springer
Facial affective behavior analysis (FABA) is crucial for understanding human mental states
from images. However, traditional approaches primarily deploy models to discriminate …

Towards faithful xai evaluation via generalization-limited backdoor watermark

M Ya, Y Li, T Dai, B Wang, Y Jiang… - The Twelfth International …, 2023 - openreview.net
Saliency-based representation visualization (SRV)($ eg $, Grad-CAM) is one of the most
classical and widely adopted explainable artificial intelligence (XAI) methods for its simplicity …

Disentangled explanations of neural network predictions by finding relevant subspaces

P Chormai, J Herrmann, KR Müller… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Explainable AI aims to overcome the black-box nature of complex ML models like neural
networks by generating explanations for their predictions. Explanations often take the form of …

Tuning LayerNorm in Attention: Towards efficient multi-modal llm finetuning

B Zhao, H Tu, C Wei, J Mei, C Xie - arXiv preprint arXiv:2312.11420, 2023 - arxiv.org
This paper introduces an efficient strategy to transform Large Language Models (LLMs) into
Multi-Modal Large Language Models (MLLMs). By conceptualizing this transformation as a …

Poisoned forgery face: Towards backdoor attacks on face forgery detection

J Liang, S Liang, A Liu, X Jia, J Kuang… - arXiv preprint arXiv …, 2024 - arxiv.org
The proliferation of face forgery techniques has raised significant concerns within society,
thereby motivating the development of face forgery detection methods. These methods aim …

Ta-cleaner: A fine-grained text alignment backdoor defense strategy for multimodal contrastive learning

Y Xun, S Liang, X Jia, X Liu, X Cao - arXiv preprint arXiv:2409.17601, 2024 - arxiv.org
Pre-trained large models for multimodal contrastive learning, such as CLIP, have been
widely recognized in the industry as highly susceptible to data-poisoned backdoor attacks …

Image Encoding and Fusion of Multi-modal Data Enhance Depression Diagnosis in Parkinson's Disease Patients

J Li, Y Zhao, H Zhang, WJ LiMember… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
The diagnosis of depression in individuals with Parkinson's Disease (PD) through the
utilization of multimodal fusion techniques represents a significant domain. The primary …

Sub-SA: Strengthen In-context Learning via Submodular Selective Annotation

J Qian, M Sun, S Zhou, Z Zhao, R Hun… - arXiv preprint arXiv …, 2024 - arxiv.org
In-context learning (ICL) leverages in-context examples as prompts for the predictions of
Large Language Models (LLMs). These prompts play a crucial role in achieving strong …

Efficient backdoor defense in multimodal contrastive learning: A token-level unlearning method for mitigating threats

K Liu, S Liang, J Liang, P Dai, X Cao - arXiv preprint arXiv:2409.19526, 2024 - arxiv.org
Multimodal contrastive learning uses various data modalities to create high-quality features,
but its reliance on extensive data sources on the Internet makes it vulnerable to backdoor …

How Many Are in This Image A Safety Evaluation Benchmark for Vision LLMs

H Tu, C Cui, Z Wang, Y Zhou, B Zhao, J Han… - … on Computer Vision, 2025 - Springer
This work focuses on benchmarking the capabilities of vision large language models
(VLLMs) in visual reasoning. Different from prior studies, we shift our focus from evaluating …