Worst cases policy gradients

F Tambon, G Laberge, L An, A Nikanjam… - Automated Software …, 2022 - Springer

Abstract Context Machine Learning (ML) has been at the heart of many innovations over the
past years. However, including it in so-called “safety-critical” systems such as automotive or …

被引用次数：82 相关文章所有 7 个版本

[PDF] arxiv.org

Recovery rl: Safe reinforcement learning with learned recovery zones

B Thananjeyan, A Balakrishna, S Nair… - IEEE Robotics and …, 2021 - ieeexplore.ieee.org

Safety remains a central obstacle preventing widespread use of RL in the real world:
learning new tasks in uncertain environments requires extensive exploration, but safety …

被引用次数：250 相关文章所有 6 个版本

[PDF] arxiv.org

Maximum entropy RL (provably) solves some robust RL problems

B Eysenbach, S Levine - arXiv preprint arXiv:2103.06257, 2021 - arxiv.org

Many potential applications of reinforcement learning (RL) require guarantees that the agent
will perform well in the face of disturbances to the dynamics or reward function. In this paper …

被引用次数：194 相关文章所有 4 个版本

[PDF] mlr.press

Can autonomous vehicles identify, recover from, and adapt to distribution shifts?

A Filos, P Tigkas, R McAllister… - International …, 2020 - proceedings.mlr.press

Abstract Out-of-training-distribution (OOD) scenarios are a common challenge of learning
agents at deployment, typically leading to arbitrary deductions and poorly-informed …

被引用次数：223 相关文章所有 5 个版本

[PDF] arxiv.org

Learning to be safe: Deep rl with a safety critic

K Srinivasan, B Eysenbach, S Ha, J Tan… - arXiv preprint arXiv …, 2020 - arxiv.org

Safety is an essential component for deploying reinforcement learning (RL) algorithms in
real-world scenarios, and is critical during the learning process itself. A natural first approach …

被引用次数：163 相关文章所有 3 个版本

[PDF] aaai.org

WCSAC: Worst-case soft actor critic for safety-constrained reinforcement learning

Q Yang, TD Simão, SH Tindemans… - Proceedings of the AAAI …, 2021 - ojs.aaai.org

Safe exploration is regarded as a key priority area for reinforcement learning research. With
separate reward and safety signals, it is natural to cast it as constrained reinforcement …

被引用次数：139 相关文章所有 9 个版本

[PDF] neurips.cc

Conservative offline distributional reinforcement learning

Y Ma, D Jayaraman, O Bastani - Advances in neural …, 2021 - proceedings.neurips.cc

Many reinforcement learning (RL) problems in practice are offline, learning purely from
observational data. A key challenge is how to ensure the learned policy is safe, which …

被引用次数：93 相关文章所有 7 个版本

[PDF] neurips.cc

One solution is not all you need: Few-shot extrapolation via structured maxent rl

S Kumar, A Kumar, S Levine… - Advances in Neural …, 2020 - proceedings.neurips.cc

While reinforcement learning algorithms can learn effective policies for complex tasks, these
policies are often brittle to even minor task variations, especially when variations are not …

被引用次数：101 相关文章所有 5 个版本

[PDF] springer.com

Safety-constrained reinforcement learning with a distributional safety critic

Q Yang, TD Simão, SH Tindemans, MTJ Spaan - Machine Learning, 2023 - Springer

Safety is critical to broadening the real-world use of reinforcement learning. Modeling the
safety aspects using a safety-cost signal separate from the reward and bounding the …

被引用次数：47 相关文章所有 12 个版本

[PDF] neurips.cc

Efficient risk-averse reinforcement learning

I Greenberg, Y Chow… - Advances in Neural …, 2022 - proceedings.neurips.cc

In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the
returns. A risk measure often focuses on the worst returns out of the agent's experience. As a …

被引用次数：43 相关文章所有 9 个版本

How to certify machine learning based safety-critical systems? A systematic literature review

Recovery rl: Safe reinforcement learning with learned recovery zones

Maximum entropy RL (provably) solves some robust RL problems

Can autonomous vehicles identify, recover from, and adapt to distribution shifts?

Learning to be safe: Deep rl with a safety critic

WCSAC: Worst-case soft actor critic for safety-constrained reinforcement learning

Conservative offline distributional reinforcement learning

One solution is not all you need: Few-shot extrapolation via structured maxent rl

Safety-constrained reinforcement learning with a distributional safety critic

Efficient risk-averse reinforcement learning

高级搜索

引用