Unbiased gradient estimation in unrolled computation graphs with persistent evolution strategies

Y Zhou, E Nezhadarya, J Ba - Advances in Neural …, 2022 - proceedings.neurips.cc

Dataset distillation aims to learn a small synthetic dataset that preserves most of the
information from the original dataset. Dataset distillation can be formulated as a bi-level …

被引用次数：145 相关文章所有 7 个版本

[PDF] neurips.cc

Theseus: A library for differentiable nonlinear optimization

L Pineda, T Fan, M Monge… - Advances in …, 2022 - proceedings.neurips.cc

We present Theseus, an efficient application-agnostic open source library for differentiable
nonlinear least squares (DNLS) optimization built on PyTorch, providing a common …

被引用次数：85 相关文章所有 6 个版本

[PDF] arxiv.org

Data distillation: A survey

N Sachdeva, J McAuley - arXiv preprint arXiv:2301.04272, 2023 - arxiv.org

The popularity of deep learning has led to the curation of a vast number of massive and
multifarious datasets. Despite having close-to-human performance on individual tasks …

被引用次数：74 相关文章所有 4 个版本

[PDF] mlr.press

Dataset distillation with convexified implicit gradients

N Loo, R Hasani, M Lechner… - … Conference on Machine …, 2023 - proceedings.mlr.press

We propose a new dataset distillation algorithm using reparameterization and
convexification of implicit gradients (RCIG), that substantially improves the state-of-the-art …

被引用次数：40 相关文章所有 6 个版本

[PDF] arxiv.org

The elements of differentiable programming

M Blondel, V Roulet - arXiv preprint arXiv:2403.14606, 2024 - arxiv.org

Artificial intelligence has recently experienced remarkable advances, fueled by large
models, vast datasets, accelerated hardware, and, last but not least, the transformative …

被引用次数：20 相关文章所有 2 个版本

[PDF] arxiv.org

Velo: Training versatile learned optimizers by scaling up

L Metz, J Harrison, CD Freeman, A Merchant… - arXiv preprint arXiv …, 2022 - arxiv.org

While deep learning models have replaced hand-designed features across many domains,
these models are still trained with hand-designed optimizers. In this work, we leverage the …

被引用次数：66 相关文章所有 2 个版本

[PDF] arxiv.org

evosax: Jax-based evolution strategies

RT Lange - Proceedings of the Companion Conference on Genetic …, 2023 - dl.acm.org

The deep learning revolution has greatly been accelerated by the'hardware lottery': Recent
advances in modern hardware accelerators, compilers and the availability of open-source …

被引用次数：51 相关文章所有 3 个版本

[PDF] openreview.net

Re-parameterizing your optimizers rather than architectures

X Ding, H Chen, X Zhang, K Huang, J Han… - arXiv preprint arXiv …, 2022 - arxiv.org

The well-designed structures in neural networks reflect the prior knowledge incorporated
into the models. However, though different models have various priors, we are used to …

被引用次数：53 相关文章所有 5 个版本

[PDF] nowpublishers.com

Tutorial on amortized optimization

B Amos - Foundations and Trends® in Machine Learning, 2023 - nowpublishers.com

Optimization is a ubiquitous modeling tool and is often deployed in settings which
repeatedly solve similar instances of the same problem. Amortized optimization methods …

被引用次数：45 相关文章所有 5 个版本

[PDF] arxiv.org

Discovering evolution strategies via meta-black-box optimization

R Lange, T Schaul, Y Chen, T Zahavy… - Proceedings of the …, 2023 - dl.acm.org

Optimizing functions without access to gradients is the remit of black-box methods such as
evolution strategies. While highly general, their learning dynamics are often times heuristic …

被引用次数：44 相关文章所有 5 个版本