Redeeming intrinsic rewards via constrained optimization

From google gemini to openai q*(q-star): A survey of reshaping the generative artificial intelligence (ai) research landscape

TR McIntosh, T Susnjak, T Liu, P Watters… - arXiv preprint arXiv …, 2023 - arxiv.org

This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …

被引用次数：90 相关文章所有 3 个版本

[PDF] mlr.press

Walk these ways: Tuning robot control for generalization with multiplicity of behavior

GB Margolis, P Agrawal - Conference on Robot Learning, 2023 - proceedings.mlr.press

Learned locomotion policies can rapidly adapt to diverse environments similar to those
experienced during training but lack a mechanism for fast tuning when they fail in an out-of …

被引用次数：105 相关文章所有 4 个版本

[PDF] mlr.press

Tgrl: An algorithm for teacher guided reinforcement learning

I Shenfeld, ZW Hong, A Tamar… - … on Machine Learning, 2023 - proceedings.mlr.press

We consider solving sequential decision-making problems in the scenario where the agent
has access to two supervision sources: $\textit {reward signal} $ and a $\textit {teacher} …

被引用次数：11 相关文章所有 7 个版本

[PDF] mlr.press

Automatic intrinsic reward shaping for exploration in deep reinforcement learning

M Yuan, B Li, X Jin, W Zeng - International Conference on …, 2023 - proceedings.mlr.press

Abstract We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and
adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement …

被引用次数：4 相关文章所有 7 个版本

[PDF] arxiv.org

An invitation to deep reinforcement learning

B Jaeger, A Geiger - arXiv preprint arXiv:2312.08365, 2023 - arxiv.org

Training a deep neural network to maximize a target objective has become the standard
recipe for successful machine learning over the last decade. These networks can be …

被引用次数：5 相关文章所有 5 个版本

A DRL-based path planning method for wheeled mobile robots in unknown environments

T Wen, X Wang, Z Zheng, Z Sun - Computers and Electrical Engineering, 2024 - Elsevier

Deep reinforcement learning-based (DRL-based) path planning in the unknown
environment is studied under continuous action space. We extend the TD3 (twin-delayed …

被引用次数：1 相关文章

[PDF] arxiv.org

Automatic Environment Shaping is the Next Frontier in RL

Y Park, GB Margolis, P Agrawal - arXiv preprint arXiv:2407.16186, 2024 - arxiv.org

Many roboticists dream of presenting a robot with a task in the evening and returning the
next morning to find the robot capable of solving the task. What is preventing us from …

被引用次数：1 相关文章所有 2 个版本

[PDF] openreview.net

Tgrl: Teacher guided reinforcement learning algorithm for pomdps

I Shenfeld, ZW Hong, A Tamar… - … Reinforcement Learning at …, 2023 - openreview.net

In many real-world problems, an agent must operate in an uncertain and partially
observable environment. Due to partial information, a policy directly trained to operate from …

被引用次数：5 相关文章

[PDF] arxiv.org

Pareto Envelope Augmented with Reinforcement Learning: Multi-objective reinforcement learning-based approach for Large-Scale Constrained Pressurized Water …

P Seurin, K Seurin - arXiv preprint arXiv:2312.10194, 2023 - arxiv.org

A novel method, the Pareto Envelope Augmented with Reinforcement Learning (PEARL),
has been developed to address the challenges posed by multi-objective problems …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Random Latent Exploration for Deep Reinforcement Learning

S Mahankali, ZW Hong, A Sekhari, A Rakhlin… - arXiv preprint arXiv …, 2024 - arxiv.org

The ability to efficiently explore high-dimensional state spaces is essential for the practical
success of deep Reinforcement Learning (RL). This paper introduces a new exploration …