BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP
Abstract Contrastive Vision-Language Pre-training known as CLIP has shown promising
effectiveness in addressing downstream image recognition tasks. However recent works …
effectiveness in addressing downstream image recognition tasks. However recent works …
Towards faithful xai evaluation via generalization-limited backdoor watermark
Saliency-based representation visualization (SRV)($ eg $, Grad-CAM) is one of the most
classical and widely adopted explainable artificial intelligence (XAI) methods for its simplicity …
classical and widely adopted explainable artificial intelligence (XAI) methods for its simplicity …
Not all prompts are secure: A switchable backdoor attack against pre-trained vision transfomers
Given the power of vision transformers a new learning paradigm pre-training and then
prompting makes it more efficient and effective to address downstream visual recognition …
prompting makes it more efficient and effective to address downstream visual recognition …
Pointncbw: Towards dataset ownership verification for point clouds via negative clean-label backdoor watermark
Recently, point clouds have been widely used in computer vision, whereas their collection is
time-consuming and expensive. As such, point cloud datasets are the valuable intellectual …
time-consuming and expensive. As such, point cloud datasets are the valuable intellectual …
Backdoor attacks on dense passage retrievers for disseminating misinformation
Dense retrievers and retrieval-augmented language models have been widely used in
various NLP applications. Despite being designed to deliver reliable and secure outcomes …
various NLP applications. Despite being designed to deliver reliable and secure outcomes …
GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval
Given a text query, partially relevant video retrieval (PRVR) seeks to find untrimmed videos
containing pertinent moments in a database. For PRVR, clip modeling is essential to capture …
containing pertinent moments in a database. For PRVR, clip modeling is essential to capture …
WaterDiff: Perceptual Image Watermarks Via Diffusion Model
Recent studies have demonstrated that diffusion probabilistic models (DPMs) have
numerous advantages in image generation through learning a decodable latent …
numerous advantages in image generation through learning a decodable latent …
Efficient Self-Supervised Video Hashing with Selective State Spaces
Self-supervised video hashing (SSVH) is a practical task in video indexing and retrieval.
Although Transformers are predominant in SSVH for their impressive temporal modeling …
Although Transformers are predominant in SSVH for their impressive temporal modeling …
One Pixel is All I Need
D Siqin, Z Xiaoyi - arXiv preprint arXiv:2412.10681, 2024 - arxiv.org
Vision Transformers (ViTs) have achieved record-breaking performance in various visual
tasks. However, concerns about their robustness against backdoor attacks have grown …
tasks. However, concerns about their robustness against backdoor attacks have grown …