Starcraft ii unplugged: Large scale offline reinforcement learning

C Gulcehre, TL Paine, S Srinivasan… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) can improve the quality of large
language model's (LLM) outputs by aligning them with human preferences. We propose a …

被引用次数：120 相关文章所有 4 个版本

[PDF] neurips.cc

Efficient diffusion policies for offline reinforcement learning

B Kang, X Ma, C Du, T Pang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets,
where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL …

被引用次数：22 相关文章所有 5 个版本

[PDF] mlr.press

Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification

L Pan, L Huang, T Ma, H Xu - International conference on …, 2022 - proceedings.mlr.press

Conservatism has led to significant progress in offline reinforcement learning (RL) where an
agent learns from pre-collected datasets. However, as many real-world scenarios involve …

被引用次数：42 相关文章所有 5 个版本

[PDF] neurips.cc

Large-scale retrieval for reinforcement learning

P Humphreys, A Guez, O Tieleman… - Advances in …, 2022 - proceedings.neurips.cc

Effective decision making involves flexibly relating past experiences and relevant contextual
information to a novel situation. In deep reinforcement learning (RL), the dominant paradigm …

被引用次数：25 相关文章所有 6 个版本

[PDF] arxiv.org

Large language models play starcraft ii: Benchmarks and a chain of summarization approach

W Ma, Q Mi, X Yan, Y Wu, R Lin, H Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

StarCraft II is a challenging benchmark for AI agents due to the necessity of both precise
micro level operations and strategic macro awareness. Previous works, such as Alphastar …

被引用次数：14 相关文章所有 2 个版本

[PDF] neurips.cc

Hokoff: real game dataset from honor of kings and its offline reinforcement learning benchmarks

Y Qu, B Wang, J Shao, Y Jiang… - Advances in …, 2024 - proceedings.neurips.cc

Abstract The advancement of Offline Reinforcement Learning (RL) and Offline Multi-Agent
Reinforcement Learning (MARL) critically depends on the availability of high-quality, pre …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

An empirical study of implicit regularization in deep offline rl

C Gulcehre, S Srinivasan, J Sygnowski… - arXiv preprint arXiv …, 2022 - arxiv.org

Deep neural networks are the most commonly used function approximators in offline
reinforcement learning. Prior works have shown that neural nets trained with TD-learning …

被引用次数：13 相关文章所有 4 个版本

[PDF] arxiv.org

Learning to Reach Goals via Diffusion

V Jain, S Ravanbakhsh - arXiv preprint arXiv:2310.02505, 2023 - arxiv.org

Diffusion models are a powerful class of generative models capable of mapping random
noise in high-dimensional spaces to a target manifold through iterative denoising. In this …

被引用次数：1 相关文章所有 4 个版本

Surrogate-assisted Monte Carlo Tree Search for real-time video games

MJ Kim, D Lee, JS Kim, CW Ahn - Engineering Applications of Artificial …, 2024 - Elsevier

Abstract Monte Carlo Tree Search (MCTS) is a pronounced empirical search algorithm for
agent decision-making, especially when enhanced by Deep Learning (DL), in mastering …

[PDF] arxiv.org

Offline equilibrium finding

S Li, X Wang, Y Zhang, J Cerny, P Li, H Chan… - arXiv preprint arXiv …, 2022 - arxiv.org

Offline reinforcement learning (offline RL) is an emerging field that has recently begun
gaining attention across various application domains due to its ability to learn strategies from …

被引用次数：5 相关文章所有 3 个版本