Alip: Adaptive language-image pre-training with synthetic caption
Abstract Contrastive Language-Image Pre-training (CLIP) has significantly boosted the
performance of various vision-language tasks by scaling up the dataset with image-text pairs …
performance of various vision-language tasks by scaling up the dataset with image-text pairs …
Target before shooting: Accurate anomaly detection and localization under one millisecond via cascade patch retrieval
In this work, by re-examining the" matching" nature of Anomaly Detection (AD), we propose
a new AD framework that simultaneously enjoys new records of AD accuracy and …
a new AD framework that simultaneously enjoys new records of AD accuracy and …
Consistent penalizing field loss for zero-shot image retrieval
C Liu, W She, M Chen, X Li, SX Yang - Expert Systems with Applications, 2024 - Elsevier
Zero-shot image retrieval involves retrieving images of unseen classes using a query image
of the same class. To determine whether a given image is of the same class as the query …
of the same class. To determine whether a given image is of the same class as the query …
Data-Efficient Multimodal Fusion on a Single GPU
The goal of multimodal alignment is to learn a single latent space that is shared between
multimodal inputs. The most powerful models in this space have been trained using massive …
multimodal inputs. The most powerful models in this space have been trained using massive …
[PDF][PDF] Rethinking Self-supervised Learning for Cross-domain Adversarial Sample Recovery
Adversarial attacks can cause misclassification in machine learning pipelines, posing a
significant safety risk in critical applications such as autonomous systems or medical …
significant safety risk in critical applications such as autonomous systems or medical …
Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval
Current image-text retrieval methods have demonstrated impressive performance in recent
years. However, they still face two problems: the inter-modal matching missing problem and …
years. However, they still face two problems: the inter-modal matching missing problem and …
Hugs Bring Double Benefits: Unsupervised Cross-Modal Hashing with Multi-granularity Aligned Transformers
Unsupervised cross-modal hashing (UCMH) has been commonly explored to support large-
scale cross-modal retrieval of unlabeled data. Despite promising progress, most existing …
scale cross-modal retrieval of unlabeled data. Despite promising progress, most existing …
Multimodal Pathology Image Search Between H&E Slides and Multiplexed Immunofluorescent Images
We present an approach for multimodal pathology image search, using dynamic time
warping (DTW) on Variational Autoencoder (VAE) latent space that is fed into a ranked …
warping (DTW) on Variational Autoencoder (VAE) latent space that is fed into a ranked …
[HTML][HTML] Efficient Image Retrieval Using Hierarchical K-Means Clustering
D Park, Y Hwang - Sensors, 2024 - mdpi.com
The objective of content-based image retrieval (CBIR) is to locate samples from a database
that are akin to a query, relying on the content embedded within the images. A contemporary …
that are akin to a query, relying on the content embedded within the images. A contemporary …
Self-Supervised Representation Learning for Adversarial Attack Detection
Supervised learning-based adversarial attack detection methods rely on a large number of
labeled data and suffer significant performance degradation when applying the trained …
labeled data and suffer significant performance degradation when applying the trained …