No" zero-shot" without exponential data: Pretraining concept frequency determines multimodal model performance
Web-crawled pretraining datasets underlie the impressive" zero-shot" evaluation
performance of multimodal models, such as CLIP for classification and Stable-Diffusion for …
performance of multimodal models, such as CLIP for classification and Stable-Diffusion for …
Data curation via joint example selection further accelerates multimodal learning
Data curation is an essential component of large-scale pretraining. In this work, we
demonstrate that jointly selecting batches of data is more effective for learning than selecting …
demonstrate that jointly selecting batches of data is more effective for learning than selecting …
On catastrophic inheritance of large foundation models
Large foundation models (LFMs) are claiming incredible performances. Yet great concerns
have been raised about their mythic and uninterpreted potentials not only in machine …
have been raised about their mythic and uninterpreted potentials not only in machine …
Clip-cid: Efficient clip distillation via cluster-instance discrimination
Contrastive Language-Image Pre-training (CLIP) has achieved excellent performance over
a wide range of tasks. However, the effectiveness of CLIP heavily relies on a substantial …
a wide range of tasks. However, the effectiveness of CLIP heavily relies on a substantial …
Towards flexible perception with visual memory
Training a neural network is a monolithic endeavor, akin to carving knowledge into stone:
once the process is completed, editing the knowledge in a network is nearly impossible …
once the process is completed, editing the knowledge in a network is nearly impossible …
The multilingual alignment prism: Aligning global and local preferences to reduce harm
A key concern with the concept of" alignment" is the implicit question of" alignment to what?".
AI systems are increasingly used across the world, yet safety alignment is often focused on …
AI systems are increasingly used across the world, yet safety alignment is often focused on …
LLM see, LLM do: Leveraging active inheritance to target non-differentiable objectives
The widespread adoption of synthetic data raises new questions about how models
generating the data can influence other large language models (LLMs). To start, our work …
generating the data can influence other large language models (LLMs). To start, our work …
Active data curation effectively distills large-scale multimodal models
Knowledge distillation (KD) is the de facto standard for compressing large-scale models into
smaller ones. Prior works have explored ever more complex KD strategies involving different …
smaller ones. Prior works have explored ever more complex KD strategies involving different …
Llm see, llm do: Guiding data generation to target non-differentiable objectives
The widespread adoption of synthetic data raises new questions about how models
generating the data can influence other large language models (LLMs) via distilled data. To …
generating the data can influence other large language models (LLMs) via distilled data. To …
Object-Focused Data Selection for Dense Prediction Tasks
Dense prediction tasks such as object detection and segmentation require high-quality
labels at pixel level, which are costly to obtain. Recent advances in foundation models have …
labels at pixel level, which are costly to obtain. Recent advances in foundation models have …