Reinforced self-training (rest) for language modeling
C Gulcehre, TL Paine, S Srinivasan… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) can improve the quality of large
language model's (LLM) outputs by aligning them with human preferences. We propose a …
language model's (LLM) outputs by aligning them with human preferences. We propose a …
Efficient diffusion policies for offline reinforcement learning
Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets,
where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL …
where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL …
Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification
Conservatism has led to significant progress in offline reinforcement learning (RL) where an
agent learns from pre-collected datasets. However, as many real-world scenarios involve …
agent learns from pre-collected datasets. However, as many real-world scenarios involve …
Large-scale retrieval for reinforcement learning
Effective decision making involves flexibly relating past experiences and relevant contextual
information to a novel situation. In deep reinforcement learning (RL), the dominant paradigm …
information to a novel situation. In deep reinforcement learning (RL), the dominant paradigm …
Large language models play starcraft ii: Benchmarks and a chain of summarization approach
StarCraft II is a challenging benchmark for AI agents due to the necessity of both precise
micro level operations and strategic macro awareness. Previous works, such as Alphastar …
micro level operations and strategic macro awareness. Previous works, such as Alphastar …
Hokoff: real game dataset from honor of kings and its offline reinforcement learning benchmarks
Abstract The advancement of Offline Reinforcement Learning (RL) and Offline Multi-Agent
Reinforcement Learning (MARL) critically depends on the availability of high-quality, pre …
Reinforcement Learning (MARL) critically depends on the availability of high-quality, pre …
An empirical study of implicit regularization in deep offline rl
C Gulcehre, S Srinivasan, J Sygnowski… - arXiv preprint arXiv …, 2022 - arxiv.org
Deep neural networks are the most commonly used function approximators in offline
reinforcement learning. Prior works have shown that neural nets trained with TD-learning …
reinforcement learning. Prior works have shown that neural nets trained with TD-learning …
Learning to Reach Goals via Diffusion
V Jain, S Ravanbakhsh - arXiv preprint arXiv:2310.02505, 2023 - arxiv.org
Diffusion models are a powerful class of generative models capable of mapping random
noise in high-dimensional spaces to a target manifold through iterative denoising. In this …
noise in high-dimensional spaces to a target manifold through iterative denoising. In this …
Surrogate-assisted Monte Carlo Tree Search for real-time video games
Abstract Monte Carlo Tree Search (MCTS) is a pronounced empirical search algorithm for
agent decision-making, especially when enhanced by Deep Learning (DL), in mastering …
agent decision-making, especially when enhanced by Deep Learning (DL), in mastering …
Offline equilibrium finding
Offline reinforcement learning (offline RL) is an emerging field that has recently begun
gaining attention across various application domains due to its ability to learn strategies from …
gaining attention across various application domains due to its ability to learn strategies from …