Provable dynamic fusion for low-quality multimodal data
The inherent challenge of multimodal fusion is to precisely capture the cross-modal
correlation and flexibly conduct cross-modal interaction. To fully release the value of each …
correlation and flexibly conduct cross-modal interaction. To fully release the value of each …
Change is hard: A closer look at subpopulation shift
Machine learning models often perform poorly on subgroups that are underrepresented in
the training data. Yet, little is understood on the variation in mechanisms that cause …
the training data. Yet, little is understood on the variation in mechanisms that cause …
The benefits of mixup for feature learning
Mixup, a simple data augmentation method that randomly mixes two data points via linear
interpolation, has been extensively applied in various deep learning applications to gain …
interpolation, has been extensively applied in various deep learning applications to gain …
Geodesic multi-modal mixup for robust fine-tuning
Pre-trained multi-modal models, such as CLIP, provide transferable embeddings and show
promising results in diverse applications. However, the analysis of learned multi-modal …
promising results in diverse applications. However, the analysis of learned multi-modal …
Enhancing minority classes by mixing: an adaptative optimal transport approach for long-tailed classification
Real-world data usually confronts severe class-imbalance problems, where several majority
classes have a significantly larger presence in the training set than minority classes. One …
classes have a significantly larger presence in the training set than minority classes. One …
Selective learning: Towards robust calibration with dynamic regularization
Miscalibration in deep learning refers to there is a discrepancy between the predicted
confidence and performance. This problem usually arises due to the overfitting problem …
confidence and performance. This problem usually arises due to the overfitting problem …
[PDF][PDF] Skip\n: A simple method to reduce hallucination in large vision-language models
Recent advancements in large vision-language models (LVLMs) have demonstrated
impressive capability in visual information understanding with human language. Despite …
impressive capability in visual information understanding with human language. Despite …
Nuisances via negativa: Adjusting for spurious correlations via data augmentation
In prediction tasks, there exist features that are related to the label in the same way across
different settings for that task; these are semantic features or semantics. Features with …
different settings for that task; these are semantic features or semantics. Features with …
Unsupervised domain adaptation via contrastive adversarial domain mixup: A case study on covid-19
Training large deep learning (DL) models with high performance for natural language
downstream tasks usually requires rich-labeled data. However, in a real-world application of …
downstream tasks usually requires rich-labeled data. However, in a real-world application of …
Improving Medical Multi-modal Contrastive Learning with Expert Annotations
Y Kumar, P Marttinen - arXiv preprint arXiv:2403.10153, 2024 - arxiv.org
We introduce eCLIP, an enhanced version of the CLIP model that integrates expert
annotations in the form of radiologist eye-gaze heatmaps. It tackles key challenges in …
annotations in the form of radiologist eye-gaze heatmaps. It tackles key challenges in …