First return, then explore

T Wu, S He, J Liu, S Sun, K Liu… - IEEE/CAA Journal of …, 2023 - ieeexplore.ieee.org

ChatGPT, an artificial intelligence generated content (AIGC) model developed by OpenAI,
has attracted world-wide attention for its capability of dealing with challenging language …

被引用次数：640 相关文章所有 4 个版本

[PDF] arxiv.org

Exploration in deep reinforcement learning: A survey

P Ladosz, L Weng, M Kim, H Oh - Information Fusion, 2022 - Elsevier

This paper reviews exploration techniques in deep reinforcement learning. Exploration
techniques are of primary importance when solving sparse reward problems. In sparse …

被引用次数：250 相关文章所有 5 个版本

[PDF] nature.com

Champion-level drone racing using deep reinforcement learning

E Kaufmann, L Bauersfeld, A Loquercio, M Müller… - Nature, 2023 - nature.com

First-person view (FPV) drone racing is a televised sport in which professional competitors
pilot high-speed aircraft through a 3D circuit. Each pilot sees the environment from the …

被引用次数：250 相关文章所有 9 个版本

[PDF] neurips.cc

Video pretraining (vpt): Learning to act by watching unlabeled online videos

B Baker, I Akkaya, P Zhokov… - Advances in …, 2022 - proceedings.neurips.cc

Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for
training models with broad, general capabilities for text, images, and other modalities …

被引用次数：221 相关文章所有 6 个版本

[PDF] mlr.press

Jump-start reinforcement learning

I Uchendu, T Xiao, Y Lu, B Zhu, M Yan… - International …, 2023 - proceedings.mlr.press

Reinforcement learning (RL) provides a theoretical framework for continuously improving an
agent's behavior via trial and error. However, efficiently learning policies from scratch can be …

被引用次数：93 相关文章所有 10 个版本

[PDF] arxiv.org

Urlb: Unsupervised reinforcement learning benchmark

M Laskin, D Yarats, H Liu, K Lee, A Zhan, K Lu… - arXiv preprint arXiv …, 2021 - arxiv.org

Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range
of complex yet specific control tasks. Yet training generalist agents that can quickly adapt to …

被引用次数：128 相关文章所有 7 个版本

[PDF] neurips.cc

Byol-explore: Exploration by bootstrapped prediction

Z Guo, S Thakoor, M Pîslar… - Advances in neural …, 2022 - proceedings.neurips.cc

We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven
exploration in visually complex environments. BYOL-Explore learns the world …

被引用次数：60 相关文章所有 5 个版本

[PDF] neurips.cc

Rorl: Robust offline reinforcement learning via conservative smoothing

R Yang, C Bai, X Ma, Z Wang… - Advances in neural …, 2022 - proceedings.neurips.cc

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount
of offline data for complex decision-making tasks. Due to the distribution shift issue, current …

被引用次数：59 相关文章所有 8 个版本

[PDF] neurips.cc

Reinforcement learning for optimization of variational quantum circuit architectures

M Ostaszewski, LM Trenkwalder… - Advances in …, 2021 - proceedings.neurips.cc

Abstract The study of Variational Quantum Eigensolvers (VQEs) has been in the spotlight in
recent times as they may lead to real-world applications of near-term quantum devices …

被引用次数：130 相关文章所有 9 个版本

[PDF] neurips.cc

Semantic exploration from language abstractions and pretrained representations

A Tam, N Rabinowitz, A Lampinen… - Advances in neural …, 2022 - proceedings.neurips.cc

Effective exploration is a challenge in reinforcement learning (RL). Novelty-based
exploration methods can suffer in high-dimensional state spaces, such as continuous …

被引用次数：59 相关文章所有 5 个版本

A brief overview of ChatGPT: The history, status quo and potential future development

Exploration in deep reinforcement learning: A survey

Champion-level drone racing using deep reinforcement learning

Video pretraining (vpt): Learning to act by watching unlabeled online videos

Jump-start reinforcement learning

Urlb: Unsupervised reinforcement learning benchmark

Byol-explore: Exploration by bootstrapped prediction

Rorl: Robust offline reinforcement learning via conservative smoothing

Reinforcement learning for optimization of variational quantum circuit architectures

Semantic exploration from language abstractions and pretrained representations

高级搜索

引用