Approximating extent measures of points

JA Silva, ER Faria, RC Barros, ER Hruschka… - ACM Computing …, 2013 - dl.acm.org

Data stream mining is an active research area that has recently emerged to discover
knowledge from large amounts of continuously generated data. In this context, several data …

被引用次数：687 相关文章所有 14 个版本

[PDF] neurips.cc

Datacomp: In search of the next generation of multimodal datasets

SY Gadre, G Ilharco, A Fang… - Advances in …, 2024 - proceedings.neurips.cc

Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable
Diffusion and GPT-4, yet their design does not receive the same research attention as model …

被引用次数：185 相关文章所有 9 个版本

[PDF] thecvf.com

Cafe: Learning to condense dataset by aligning features

K Wang, B Zhao, X Peng, Z Zhu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Dataset condensation aims at reducing the network training effort through condensing a
cumbersome training set into a compact synthetic one. State-of-the-art approaches largely …

被引用次数：167 相关文章所有 8 个版本

[PDF] mlr.press

Dataset condensation with differentiable siamese augmentation

B Zhao, H Bilen - International Conference on Machine …, 2021 - proceedings.mlr.press

In many machine learning problems, large-scale datasets have become the de-facto
standard to train state-of-the-art deep networks at the price of heavy computation load. In this …

被引用次数：217 相关文章所有 6 个版本

[PDF] thecvf.com

Improved distribution matching for dataset condensation

G Zhao, G Li, Y Qin, Y Yu - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Dataset Condensation aims to condense a large dataset into a smaller one while
maintaining its ability to train a well-performing model, thus reducing the storage cost and …

被引用次数：37 相关文章所有 7 个版本

[PDF] arxiv.org

Dataset condensation with gradient matching

B Zhao, KR Mopuri, H Bilen - arXiv preprint arXiv:2006.05929, 2020 - arxiv.org

As the state-of-the-art machine learning methods in many fields rely on larger datasets,
storing datasets and training models on them become significantly more expensive. This …

被引用次数：362 相关文章所有 9 个版本

[PDF] mlr.press

Coresets for data-efficient training of machine learning models

B Mirzasoleiman, J Bilmes… - … Conference on Machine …, 2020 - proceedings.mlr.press

Incremental gradient (IG) methods, such as stochastic gradient descent and its variants are
commonly used for large scale optimization in machine learning. Despite the sustained effort …

被引用次数：292 相关文章所有 13 个版本

[PDF] arxiv.org

Dataset pruning: Reducing training data by examining generalization influence

S Yang, Z Xie, H Peng, M Xu, M Sun, P Li - arXiv preprint arXiv …, 2022 - arxiv.org

The great success of deep learning heavily relies on increasingly larger training data, which
comes at a price of huge computational and infrastructural costs. This poses crucial …

被引用次数：60 相关文章所有 4 个版本

[PDF] siam.org

Turning Big Data Into Tiny Data: Constant-Size Coresets for -Means, PCA, and Projective Clustering

D Feldman, M Schmidt, C Sohler - SIAM Journal on Computing, 2020 - SIAM

We develop and analyze a method to reduce the size of a very large set of data points in a
high-dimensional Euclidean space R^d to a small set of weighted points such that the result …

被引用次数：641 相关文章所有 13 个版本

[PDF] arxiv.org

T-mars: Improving visual representations by circumventing text feature learning

P Maini, S Goyal, ZC Lipton, JZ Kolter… - arXiv preprint arXiv …, 2023 - arxiv.org

Large web-sourced multimodal datasets have powered a slew of new methods for learning
general-purpose visual representations, advancing the state of the art in computer vision …

被引用次数：19 相关文章所有 3 个版本