Datacomp: In search of the next generation of multimodal datasets
Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable
Diffusion and GPT-4, yet their design does not receive the same research attention as model …
Diffusion and GPT-4, yet their design does not receive the same research attention as model …
On feature learning in the presence of spurious correlations
Deep classifiers are known to rely on spurious features—patterns which are correlated with
the target on the training data but not inherently relevant to the learning problem, such as the …
the target on the training data but not inherently relevant to the learning problem, such as the …
Connect, not collapse: Explaining contrastive learning for unsupervised domain adaptation
We consider unsupervised domain adaptation (UDA), where labeled data from a source
domain (eg, photos) and unlabeled data from a target domain (eg, sketches) are used to …
domain (eg, photos) and unlabeled data from a target domain (eg, sketches) are used to …
Wild-time: A benchmark of in-the-wild distribution shift over time
Distribution shifts occur when the test distribution differs from the training distribution, and
can considerably degrade performance of machine learning models deployed in the real …
can considerably degrade performance of machine learning models deployed in the real …
Artificial intelligence for science in quantum, atomistic, and continuum systems
Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural
sciences. Today, AI has started to advance natural sciences by improving, accelerating, and …
sciences. Today, AI has started to advance natural sciences by improving, accelerating, and …
Make the u in uda matter: Invariant consistency learning for unsupervised domain adaptation
Abstract Domain Adaptation (DA) is always challenged by the spurious correlation between
the domain-invariant features (eg, class identity) and the domain-specific ones (eg …
the domain-invariant features (eg, class identity) and the domain-specific ones (eg …
A broad study of pre-training for domain generalization and adaptation
Deep models must learn robust and transferable representations in order to perform well on
new domains. While domain transfer methods (eg, domain adaptation, domain …
new domains. While domain transfer methods (eg, domain adaptation, domain …
DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery--A Focus on Affinity Prediction Problems with Noise Annotations
AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making
the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its …
the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its …
Domain adaptation under open set label shift
S Garg, S Balakrishnan… - Advances in Neural …, 2022 - proceedings.neurips.cc
We introduce the problem of domain adaptation under Open Set Label Shift (OSLS), where
the label distribution can change arbitrarily and a new class may arrive during deployment …
the label distribution can change arbitrarily and a new class may arrive during deployment …
Towards federated foundation models: Scalable dataset pipelines for group-structured learning
Abstract We introduce Dataset Grouper, a library to create large-scale group-structured (eg,
federated) datasets, enabling federated learning simulation at the scale of foundation …
federated) datasets, enabling federated learning simulation at the scale of foundation …