Data stream clustering: A survey
Data stream mining is an active research area that has recently emerged to discover
knowledge from large amounts of continuously generated data. In this context, several data …
knowledge from large amounts of continuously generated data. In this context, several data …
Datacomp: In search of the next generation of multimodal datasets
Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable
Diffusion and GPT-4, yet their design does not receive the same research attention as model …
Diffusion and GPT-4, yet their design does not receive the same research attention as model …
Cafe: Learning to condense dataset by aligning features
Dataset condensation aims at reducing the network training effort through condensing a
cumbersome training set into a compact synthetic one. State-of-the-art approaches largely …
cumbersome training set into a compact synthetic one. State-of-the-art approaches largely …
Dataset condensation with differentiable siamese augmentation
In many machine learning problems, large-scale datasets have become the de-facto
standard to train state-of-the-art deep networks at the price of heavy computation load. In this …
standard to train state-of-the-art deep networks at the price of heavy computation load. In this …
Improved distribution matching for dataset condensation
Dataset Condensation aims to condense a large dataset into a smaller one while
maintaining its ability to train a well-performing model, thus reducing the storage cost and …
maintaining its ability to train a well-performing model, thus reducing the storage cost and …
Dataset condensation with gradient matching
As the state-of-the-art machine learning methods in many fields rely on larger datasets,
storing datasets and training models on them become significantly more expensive. This …
storing datasets and training models on them become significantly more expensive. This …
Coresets for data-efficient training of machine learning models
B Mirzasoleiman, J Bilmes… - … Conference on Machine …, 2020 - proceedings.mlr.press
Incremental gradient (IG) methods, such as stochastic gradient descent and its variants are
commonly used for large scale optimization in machine learning. Despite the sustained effort …
commonly used for large scale optimization in machine learning. Despite the sustained effort …
Dataset pruning: Reducing training data by examining generalization influence
The great success of deep learning heavily relies on increasingly larger training data, which
comes at a price of huge computational and infrastructural costs. This poses crucial …
comes at a price of huge computational and infrastructural costs. This poses crucial …
Turning Big Data Into Tiny Data: Constant-Size Coresets for -Means, PCA, and Projective Clustering
We develop and analyze a method to reduce the size of a very large set of data points in a
high-dimensional Euclidean space R^d to a small set of weighted points such that the result …
high-dimensional Euclidean space R^d to a small set of weighted points such that the result …
T-mars: Improving visual representations by circumventing text feature learning
Large web-sourced multimodal datasets have powered a slew of new methods for learning
general-purpose visual representations, advancing the state of the art in computer vision …
general-purpose visual representations, advancing the state of the art in computer vision …