Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey on Hybrid Algorithms

P Li, J Hao, H Tang, X Fu, Y Zhen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Evolutionary Reinforcement Learning (ERL), which integrates Evolutionary Algorithms (EAs)
and Reinforcement Learning (RL) for optimization, has demonstrated remarkable …

Meta-learning the mirror map in policy mirror descent

C Alfano, S Towers, S Sapora, C Lu… - arXiv preprint arXiv …, 2024 - arxiv.org
Policy Mirror Descent (PMD) is a popular framework in reinforcement learning, serving as a
unifying perspective that encompasses numerous algorithms. These algorithms are derived …

EvIL: Evolution Strategies for Generalisable Imitation Learning

S Sapora, G Swamy, C Lu, YW Teh… - arXiv preprint arXiv …, 2024 - arxiv.org
Often times in imitation learning (IL), the environment we collect expert demonstrations in
and the environment we want to deploy our learned policy in aren't exactly the same (eg …

Discovering Minimal Reinforcement Learning Environments

J Liesen, C Lu, A Lupu, JN Foerster… - arXiv preprint arXiv …, 2024 - arxiv.org
Reinforcement learning (RL) agents are commonly trained and evaluated in the same
environment. In contrast, humans often train in a specialized environment before being …

Can Learned Optimization Make Reinforcement Learning Less Difficult?

AD Goldie, C Lu, MT Jackson, S Whiteson… - arXiv preprint arXiv …, 2024 - arxiv.org
While reinforcement learning (RL) holds great potential for decision making in the real world,
it suffers from a number of unique difficulties which often need specific consideration. In …

Discovering Preference Optimization Algorithms with and for Large Language Models

C Lu, S Holt, C Fanconi, AJ Chan, J Foerster… - arXiv preprint arXiv …, 2024 - arxiv.org
Offline preference optimization is a key method for enhancing and controlling the quality of
Large Language Model (LLM) outputs. Typically, preference optimization is approached as …

BAMDP Shaping: a Unified Theoretical Framework for Intrinsic Motivation and Reward Shaping

A Lidayan, M Dennis, S Russell - arXiv preprint arXiv:2409.05358, 2024 - arxiv.org
Intrinsic motivation (IM) and reward shaping are common methods for guiding the
exploration of reinforcement learning (RL) agents by adding pseudo-rewards. Designing …

Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial Optimization

F Chalumeau, R Shabe, N de Nicola… - arXiv preprint arXiv …, 2024 - arxiv.org
Combinatorial Optimization is crucial to numerous real-world applications, yet still presents
challenges due to its (NP-) hard nature. Amongst existing approaches, heuristics often offer …

JaxLife: An Open-Ended Agentic Simulator

C Lu, M Beukman, M Matthews… - ALIFE 2024: Proceedings …, 2024 - direct.mit.edu
Human intelligence emerged through the process of natural selection and evolution on
Earth. We investigate what it would take to re-create this process in silico. While past work …

Higher Order and Self-Referential Evolution for Population-based Methods

S Coward, C Lu, A Letcher, M Jiang… - … Exploring Meta-Learning … - openreview.net
Due to their simplicity and support of high levels of parallelism, evolutionary algorithms have
regained popularity in machine learning applications such as curriculum generation for …