Discovering temporally-aware reinforcement learning algorithms

Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey on Hybrid Algorithms

P Li, J Hao, H Tang, X Fu, Y Zhen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Evolutionary Reinforcement Learning (ERL), which integrates Evolutionary Algorithms (EAs)
and Reinforcement Learning (RL) for optimization, has demonstrated remarkable …

[PDF] arxiv.org

Meta-learning the mirror map in policy mirror descent

C Alfano, S Towers, S Sapora, C Lu… - arXiv preprint arXiv …, 2024 - arxiv.org

Policy Mirror Descent (PMD) is a popular framework in reinforcement learning, serving as a
unifying perspective that encompasses numerous algorithms. These algorithms are derived …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

EvIL: Evolution Strategies for Generalisable Imitation Learning

S Sapora, G Swamy, C Lu, YW Teh… - arXiv preprint arXiv …, 2024 - arxiv.org

Often times in imitation learning (IL), the environment we collect expert demonstrations in
and the environment we want to deploy our learned policy in aren't exactly the same (eg …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Discovering Minimal Reinforcement Learning Environments

J Liesen, C Lu, A Lupu, JN Foerster… - arXiv preprint arXiv …, 2024 - arxiv.org

Reinforcement learning (RL) agents are commonly trained and evaluated in the same
environment. In contrast, humans often train in a specialized environment before being …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Can Learned Optimization Make Reinforcement Learning Less Difficult?

AD Goldie, C Lu, MT Jackson, S Whiteson… - arXiv preprint arXiv …, 2024 - arxiv.org

While reinforcement learning (RL) holds great potential for decision making in the real world,
it suffers from a number of unique difficulties which often need specific consideration. In …

[PDF] arxiv.org

Discovering Preference Optimization Algorithms with and for Large Language Models

C Lu, S Holt, C Fanconi, AJ Chan, J Foerster… - arXiv preprint arXiv …, 2024 - arxiv.org

Offline preference optimization is a key method for enhancing and controlling the quality of
Large Language Model (LLM) outputs. Typically, preference optimization is approached as …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org