A tour of reinforcement learning: The view from continuous control

B Recht - Annual Review of Control, Robotics, and Autonomous …, 2019 - annualreviews.org
This article surveys reinforcement learning from the perspective of optimization and control,
with a focus on continuous control applications. It reviews the general formulation …

Fine-tuning language models with just forward passes

S Malladi, T Gao, E Nichani… - Advances in …, 2023 - proceedings.neurips.cc
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …

[图书][B] Control systems and reinforcement learning

S Meyn - 2022 - books.google.com
A high school student can create deep Q-learning code to control her robot, without any
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …

Derivative-free optimization methods

J Larson, M Menickelly, SM Wild - Acta Numerica, 2019 - cambridge.org
In many optimization problems arising from scientific, engineering and artificial intelligence
applications, objective and constraint functions are available only as the output of a black …

Derivative-free reinforcement learning: A review

H Qian, Y Yu - Frontiers of Computer Science, 2021 - Springer
Reinforcement learning is about learning agent models that make the best sequential
decisions in unknown environments. In an unknown environment, the agent needs to …

Simple random search of static linear policies is competitive for reinforcement learning

H Mania, A Guy, B Recht - Advances in neural information …, 2018 - proceedings.neurips.cc
Abstract Model-free reinforcement learning aims to offer off-the-shelf solutions for controlling
dynamical systems without requiring models of the system dynamics. We introduce a model …

Simple random search provides a competitive approach to reinforcement learning

H Mania, A Guy, B Recht - arXiv preprint arXiv:1803.07055, 2018 - arxiv.org
A common belief in model-free reinforcement learning is that methods based on random
search in the parameter space of policies exhibit significantly worse sample complexity than …

Derivative-free methods for policy optimization: Guarantees for linear quadratic systems

D Malik, A Pananjady, K Bhatia, K Khamaru… - Journal of Machine …, 2020 - jmlr.org
We study derivative-free methods for policy optimization over the class of linear policies. We
focus on characterizing the convergence rate of these methods when applied to linear …

Optimal rates for zero-order convex optimization: The power of two function evaluations

JC Duchi, MI Jordan, MJ Wainwright… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
We consider derivative-free algorithms for stochastic and nonstochastic convex optimization
problems that use only function values rather than gradients. Focusing on nonasymptotic …

Radiative backpropagation: An adjoint method for lightning-fast differentiable rendering

M Nimier-David, S Speierer, B Ruiz… - ACM Transactions on …, 2020 - dl.acm.org
Physically based differentiable rendering has recently evolved into a powerful tool for
solving inverse problems involving light. Methods in this area perform a differentiable …