Facial affective behavior analysis with instruction tuning
Facial affective behavior analysis (FABA) is crucial for understanding human mental states
from images. However, traditional approaches primarily deploy models to discriminate …
from images. However, traditional approaches primarily deploy models to discriminate …
Towards faithful xai evaluation via generalization-limited backdoor watermark
Saliency-based representation visualization (SRV)($ eg $, Grad-CAM) is one of the most
classical and widely adopted explainable artificial intelligence (XAI) methods for its simplicity …
classical and widely adopted explainable artificial intelligence (XAI) methods for its simplicity …
Disentangled explanations of neural network predictions by finding relevant subspaces
Explainable AI aims to overcome the black-box nature of complex ML models like neural
networks by generating explanations for their predictions. Explanations often take the form of …
networks by generating explanations for their predictions. Explanations often take the form of …
Tuning LayerNorm in Attention: Towards efficient multi-modal llm finetuning
This paper introduces an efficient strategy to transform Large Language Models (LLMs) into
Multi-Modal Large Language Models (MLLMs). By conceptualizing this transformation as a …
Multi-Modal Large Language Models (MLLMs). By conceptualizing this transformation as a …
Poisoned forgery face: Towards backdoor attacks on face forgery detection
The proliferation of face forgery techniques has raised significant concerns within society,
thereby motivating the development of face forgery detection methods. These methods aim …
thereby motivating the development of face forgery detection methods. These methods aim …
Ta-cleaner: A fine-grained text alignment backdoor defense strategy for multimodal contrastive learning
Pre-trained large models for multimodal contrastive learning, such as CLIP, have been
widely recognized in the industry as highly susceptible to data-poisoned backdoor attacks …
widely recognized in the industry as highly susceptible to data-poisoned backdoor attacks …
Image Encoding and Fusion of Multi-modal Data Enhance Depression Diagnosis in Parkinson's Disease Patients
J Li, Y Zhao, H Zhang, WJ LiMember… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
The diagnosis of depression in individuals with Parkinson's Disease (PD) through the
utilization of multimodal fusion techniques represents a significant domain. The primary …
utilization of multimodal fusion techniques represents a significant domain. The primary …
Sub-SA: Strengthen In-context Learning via Submodular Selective Annotation
In-context learning (ICL) leverages in-context examples as prompts for the predictions of
Large Language Models (LLMs). These prompts play a crucial role in achieving strong …
Large Language Models (LLMs). These prompts play a crucial role in achieving strong …
Efficient backdoor defense in multimodal contrastive learning: A token-level unlearning method for mitigating threats
Multimodal contrastive learning uses various data modalities to create high-quality features,
but its reliance on extensive data sources on the Internet makes it vulnerable to backdoor …
but its reliance on extensive data sources on the Internet makes it vulnerable to backdoor …
How Many Are in This Image A Safety Evaluation Benchmark for Vision LLMs
This work focuses on benchmarking the capabilities of vision large language models
(VLLMs) in visual reasoning. Different from prior studies, we shift our focus from evaluating …
(VLLMs) in visual reasoning. Different from prior studies, we shift our focus from evaluating …