Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey on Hybrid Algorithms
Evolutionary Reinforcement Learning (ERL), which integrates Evolutionary Algorithms (EAs)
and Reinforcement Learning (RL) for optimization, has demonstrated remarkable …
and Reinforcement Learning (RL) for optimization, has demonstrated remarkable …
Meta-learning the mirror map in policy mirror descent
Policy Mirror Descent (PMD) is a popular framework in reinforcement learning, serving as a
unifying perspective that encompasses numerous algorithms. These algorithms are derived …
unifying perspective that encompasses numerous algorithms. These algorithms are derived …
EvIL: Evolution Strategies for Generalisable Imitation Learning
Often times in imitation learning (IL), the environment we collect expert demonstrations in
and the environment we want to deploy our learned policy in aren't exactly the same (eg …
and the environment we want to deploy our learned policy in aren't exactly the same (eg …
Discovering Minimal Reinforcement Learning Environments
Reinforcement learning (RL) agents are commonly trained and evaluated in the same
environment. In contrast, humans often train in a specialized environment before being …
environment. In contrast, humans often train in a specialized environment before being …
Can Learned Optimization Make Reinforcement Learning Less Difficult?
While reinforcement learning (RL) holds great potential for decision making in the real world,
it suffers from a number of unique difficulties which often need specific consideration. In …
it suffers from a number of unique difficulties which often need specific consideration. In …
Discovering Preference Optimization Algorithms with and for Large Language Models
Offline preference optimization is a key method for enhancing and controlling the quality of
Large Language Model (LLM) outputs. Typically, preference optimization is approached as …
Large Language Model (LLM) outputs. Typically, preference optimization is approached as …
BAMDP Shaping: a Unified Theoretical Framework for Intrinsic Motivation and Reward Shaping
A Lidayan, M Dennis, S Russell - arXiv preprint arXiv:2409.05358, 2024 - arxiv.org
Intrinsic motivation (IM) and reward shaping are common methods for guiding the
exploration of reinforcement learning (RL) agents by adding pseudo-rewards. Designing …
exploration of reinforcement learning (RL) agents by adding pseudo-rewards. Designing …
Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial Optimization
F Chalumeau, R Shabe, N de Nicola… - arXiv preprint arXiv …, 2024 - arxiv.org
Combinatorial Optimization is crucial to numerous real-world applications, yet still presents
challenges due to its (NP-) hard nature. Amongst existing approaches, heuristics often offer …
challenges due to its (NP-) hard nature. Amongst existing approaches, heuristics often offer …
JaxLife: An Open-Ended Agentic Simulator
Human intelligence emerged through the process of natural selection and evolution on
Earth. We investigate what it would take to re-create this process in silico. While past work …
Earth. We investigate what it would take to re-create this process in silico. While past work …
Higher Order and Self-Referential Evolution for Population-based Methods
Due to their simplicity and support of high levels of parallelism, evolutionary algorithms have
regained popularity in machine learning applications such as curriculum generation for …
regained popularity in machine learning applications such as curriculum generation for …