Reinforced self-training (rest) for language modeling

C Gulcehre, TL Paine, S Srinivasan… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) can improve the quality of large
language model's (LLM) outputs by aligning them with human preferences. We propose a …

Efficient diffusion policies for offline reinforcement learning

B Kang, X Ma, C Du, T Pang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets,
where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL …

Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification

L Pan, L Huang, T Ma, H Xu - International conference on …, 2022 - proceedings.mlr.press
Conservatism has led to significant progress in offline reinforcement learning (RL) where an
agent learns from pre-collected datasets. However, as many real-world scenarios involve …

Large-scale retrieval for reinforcement learning

P Humphreys, A Guez, O Tieleman… - Advances in …, 2022 - proceedings.neurips.cc
Effective decision making involves flexibly relating past experiences and relevant contextual
information to a novel situation. In deep reinforcement learning (RL), the dominant paradigm …

Large language models play starcraft ii: Benchmarks and a chain of summarization approach

W Ma, Q Mi, X Yan, Y Wu, R Lin, H Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
StarCraft II is a challenging benchmark for AI agents due to the necessity of both precise
micro level operations and strategic macro awareness. Previous works, such as Alphastar …

Hokoff: real game dataset from honor of kings and its offline reinforcement learning benchmarks

Y Qu, B Wang, J Shao, Y Jiang… - Advances in …, 2024 - proceedings.neurips.cc
Abstract The advancement of Offline Reinforcement Learning (RL) and Offline Multi-Agent
Reinforcement Learning (MARL) critically depends on the availability of high-quality, pre …

An empirical study of implicit regularization in deep offline rl

C Gulcehre, S Srinivasan, J Sygnowski… - arXiv preprint arXiv …, 2022 - arxiv.org
Deep neural networks are the most commonly used function approximators in offline
reinforcement learning. Prior works have shown that neural nets trained with TD-learning …

Learning to Reach Goals via Diffusion

V Jain, S Ravanbakhsh - arXiv preprint arXiv:2310.02505, 2023 - arxiv.org
Diffusion models are a powerful class of generative models capable of mapping random
noise in high-dimensional spaces to a target manifold through iterative denoising. In this …

Surrogate-assisted Monte Carlo Tree Search for real-time video games

MJ Kim, D Lee, JS Kim, CW Ahn - Engineering Applications of Artificial …, 2024 - Elsevier
Abstract Monte Carlo Tree Search (MCTS) is a pronounced empirical search algorithm for
agent decision-making, especially when enhanced by Deep Learning (DL), in mastering …

Offline equilibrium finding

S Li, X Wang, Y Zhang, J Cerny, P Li, H Chan… - arXiv preprint arXiv …, 2022 - arxiv.org
Offline reinforcement learning (offline RL) is an emerging field that has recently begun
gaining attention across various application domains due to its ability to learn strategies from …