Provable dynamic fusion for low-quality multimodal data

Q Zhang, H Wu, C Zhang, Q Hu, H Fu… - International …, 2023 - proceedings.mlr.press
The inherent challenge of multimodal fusion is to precisely capture the cross-modal
correlation and flexibly conduct cross-modal interaction. To fully release the value of each …

Change is hard: A closer look at subpopulation shift

Y Yang, H Zhang, D Katabi, M Ghassemi - arXiv preprint arXiv:2302.12254, 2023 - arxiv.org
Machine learning models often perform poorly on subgroups that are underrepresented in
the training data. Yet, little is understood on the variation in mechanisms that cause …

The benefits of mixup for feature learning

D Zou, Y Cao, Y Li, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
Mixup, a simple data augmentation method that randomly mixes two data points via linear
interpolation, has been extensively applied in various deep learning applications to gain …

Geodesic multi-modal mixup for robust fine-tuning

C Oh, J So, H Byun, YT Lim, M Shin… - Advances in Neural …, 2024 - proceedings.neurips.cc
Pre-trained multi-modal models, such as CLIP, provide transferable embeddings and show
promising results in diverse applications. However, the analysis of learned multi-modal …

Enhancing minority classes by mixing: an adaptative optimal transport approach for long-tailed classification

J Gao, H Zhao, Z Li, D Guo - Advances in Neural …, 2024 - proceedings.neurips.cc
Real-world data usually confronts severe class-imbalance problems, where several majority
classes have a significantly larger presence in the training set than minority classes. One …

Selective learning: Towards robust calibration with dynamic regularization

Z Han, Y Yang, C Zhang, L Zhang, JT Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
Miscalibration in deep learning refers to there is a discrepancy between the predicted
confidence and performance. This problem usually arises due to the overfitting problem …

[PDF][PDF] Skip\n: A simple method to reduce hallucination in large vision-language models

Z Han, Z Bai, H Mei, Q Xu, C Zhang… - arXiv preprint arXiv …, 2024 - oar.a-star.edu.sg
Recent advancements in large vision-language models (LVLMs) have demonstrated
impressive capability in visual information understanding with human language. Despite …

Nuisances via negativa: Adjusting for spurious correlations via data augmentation

A Puli, N Joshi, Y Wald, H He, R Ranganath - arXiv preprint arXiv …, 2022 - arxiv.org
In prediction tasks, there exist features that are related to the label in the same way across
different settings for that task; these are semantic features or semantics. Features with …

Unsupervised domain adaptation via contrastive adversarial domain mixup: A case study on covid-19

H Zeng, Z Yue, L Shang, Y Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Training large deep learning (DL) models with high performance for natural language
downstream tasks usually requires rich-labeled data. However, in a real-world application of …

Improving Medical Multi-modal Contrastive Learning with Expert Annotations

Y Kumar, P Marttinen - arXiv preprint arXiv:2403.10153, 2024 - arxiv.org
We introduce eCLIP, an enhanced version of the CLIP model that integrates expert
annotations in the form of radiologist eye-gaze heatmaps. It tackles key challenges in …