No" zero-shot" without exponential data: Pretraining concept frequency determines multimodal model performance

V Udandarao, A Prabhu, A Ghosh… - The Thirty-eighth …, 2024 - openreview.net
Web-crawled pretraining datasets underlie the impressive" zero-shot" evaluation
performance of multimodal models, such as CLIP for classification and Stable-Diffusion for …

Data curation via joint example selection further accelerates multimodal learning

T Evans, N Parthasarathy, H Merzic… - arXiv preprint arXiv …, 2024 - arxiv.org
Data curation is an essential component of large-scale pretraining. In this work, we
demonstrate that jointly selecting batches of data is more effective for learning than selecting …

On catastrophic inheritance of large foundation models

H Chen, B Raj, X Xie, J Wang - arXiv preprint arXiv:2402.01909, 2024 - arxiv.org
Large foundation models (LFMs) are claiming incredible performances. Yet great concerns
have been raised about their mythic and uninterpreted potentials not only in machine …

Clip-cid: Efficient clip distillation via cluster-instance discrimination

K Yang, T Gu, X An, H Jiang, X Dai, Z Feng… - arXiv preprint arXiv …, 2024 - arxiv.org
Contrastive Language-Image Pre-training (CLIP) has achieved excellent performance over
a wide range of tasks. However, the effectiveness of CLIP heavily relies on a substantial …

Towards flexible perception with visual memory

R Geirhos, P Jaini, A Stone, S Medapati, X Yi… - arXiv preprint arXiv …, 2024 - arxiv.org
Training a neural network is a monolithic endeavor, akin to carving knowledge into stone:
once the process is completed, editing the knowledge in a network is nearly impossible …

The multilingual alignment prism: Aligning global and local preferences to reduce harm

A Ahmadian, B Ermis, S Goldfarb-Tarrant… - arXiv preprint arXiv …, 2024 - arxiv.org
A key concern with the concept of" alignment" is the implicit question of" alignment to what?".
AI systems are increasingly used across the world, yet safety alignment is often focused on …

LLM see, LLM do: Leveraging active inheritance to target non-differentiable objectives

L Shimabucoro, S Ruder, J Kreutzer… - Proceedings of the …, 2024 - aclanthology.org
The widespread adoption of synthetic data raises new questions about how models
generating the data can influence other large language models (LLMs). To start, our work …

Active data curation effectively distills large-scale multimodal models

V Udandarao, N Parthasarathy, MF Naeem… - arXiv preprint arXiv …, 2024 - arxiv.org
Knowledge distillation (KD) is the de facto standard for compressing large-scale models into
smaller ones. Prior works have explored ever more complex KD strategies involving different …

Llm see, llm do: Guiding data generation to target non-differentiable objectives

L Shimabucoro, S Ruder, J Kreutzer, M Fadaee… - arXiv preprint arXiv …, 2024 - arxiv.org
The widespread adoption of synthetic data raises new questions about how models
generating the data can influence other large language models (LLMs) via distilled data. To …

Object-Focused Data Selection for Dense Prediction Tasks

N Popp, D Zhang, JH Metzen, M Hein… - arXiv preprint arXiv …, 2024 - arxiv.org
Dense prediction tasks such as object detection and segmentation require high-quality
labels at pixel level, which are costly to obtain. Recent advances in foundation models have …