- 学术资源搜索

Recent advances in reinforcement learning in finance

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library

The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

被引用次数：196 相关文章所有 13 个版本

[PDF] arxiv.org

Foundation models for decision making: Problems, methods, and opportunities

S Yang, O Nachum, Y Du, J Wei, P Abbeel… - arXiv preprint arXiv …, 2023 - arxiv.org

Foundation models pretrained on diverse data at scale have demonstrated extraordinary
capabilities in a wide range of vision and language tasks. When such models are deployed …

被引用次数：138 相关文章所有 3 个版本

[PDF] neurips.cc

Bridging offline reinforcement learning and imitation learning: A tale of pessimism

P Rashidinejad, B Zhu, C Ma, J Jiao… - Advances in Neural …, 2021 - proceedings.neurips.cc

Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from
a fixed dataset without active data collection. Based on the composition of the offline dataset …

被引用次数：311 相关文章所有 8 个版本

[PDF] neurips.cc

Morel: Model-based offline reinforcement learning

R Kidambi, A Rajeswaran… - Advances in neural …, 2020 - proceedings.neurips.cc

In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based
solely on a dataset of historical interactions with the environment. This serves as an extreme …

被引用次数：775 相关文章所有 7 个版本

[PDF] arxiv.org

The statistical complexity of interactive decision making

DJ Foster, SM Kakade, J Qian, A Rakhlin - arXiv preprint arXiv:2112.13487, 2021 - arxiv.org

A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

被引用次数：203 相关文章所有 6 个版本

[PDF] mlr.press

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press

Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

被引用次数：105 相关文章所有 10 个版本

[PDF] mlr.press

Nearly minimax optimal reinforcement learning for linear mixture markov decision processes

D Zhou, Q Gu, C Szepesvari - Conference on Learning …, 2021 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …

被引用次数：242 相关文章所有 7 个版本

[PDF] neurips.cc

Robust reinforcement learning using offline data

K Panaganti, Z Xu, D Kalathil… - Advances in neural …, 2022 - proceedings.neurips.cc

The goal of robust reinforcement learning (RL) is to learn a policy that is robust against the
uncertainty in model parameters. Parameter uncertainty commonly occurs in many real …

被引用次数：85 相关文章所有 8 个版本

[PDF] informs.org

Fast global convergence of natural policy gradient methods with entropy regularization

S Cen, C Cheng, Y Chen, Y Wei… - Operations …, 2022 - pubsonline.informs.org

Natural policy gradient (NPG) methods are among the most widely used policy optimization
algorithms in contemporary reinforcement learning. This class of methods is often applied in …

被引用次数：228 相关文章所有 15 个版本

[PDF] nowpublishers.com

Spectral methods for data science: A statistical perspective

Y Chen, Y Chi, J Fan, C Ma - Foundations and Trends® in …, 2021 - nowpublishers.com

Spectral methods have emerged as a simple yet surprisingly effective approach for
extracting information from massive, noisy and incomplete data. In a nutshell, spectral …

被引用次数：178 相关文章所有 7 个版本