- 学术资源搜索

Representation learning for online and offline rl in low-rank mdps

M Uehara, X Zhang, W Sun - arXiv preprint arXiv:2110.04652, 2021 - arxiv.org

This work studies the question of Representation Learning in RL: how can we learn a
compact low-dimensional representation such that on top of the representation we can …

被引用次数：153 相关文章所有 3 个版本

[PDF] arxiv.org

Pessimistic model-based offline reinforcement learning under partial coverage

M Uehara, W Sun - arXiv preprint arXiv:2107.06226, 2021 - arxiv.org

We study model-based offline Reinforcement Learning with general function approximation
without a full coverage assumption on the offline data distribution. We present an algorithm …

被引用次数：159 相关文章所有 4 个版本

[PDF] mlr.press

Efficient reinforcement learning in block mdps: A model-free representation learning approach

X Zhang, Y Song, M Uehara, M Wang… - International …, 2022 - proceedings.mlr.press

We present BRIEE, an algorithm for efficient reinforcement learning in Markov Decision
Processes with block-structured dynamics (ie, Block MDPs), where rich observations are …

被引用次数：67 相关文章所有 4 个版本

[PDF] mlr.press

Learning bellman complete representations for offline policy evaluation

J Chang, K Wang, N Kallus… - … Conference on Machine …, 2022 - proceedings.mlr.press

We study representation learning for Offline Reinforcement Learning (RL), focusing on the
important task of Offline Policy Evaluation (OPE). Recent work shows that, in contrast to …

被引用次数：14 相关文章所有 4 个版本

[PDF] ssrn.com

Shapley meets uniform: An axiomatic framework for attribution in online advertising

R Singal, O Besbes, A Desir, V Goyal… - The world wide web …, 2019 - dl.acm.org

One of the central challenges in online advertising is attribution, namely, assessing the
contribution of individual advertiser actions including emails, display ads and search ads to …

被引用次数：56 相关文章所有 9 个版本

[PDF] neurips.cc

Context-lumpable stochastic bandits

CW Lee, Q Liu, Y Abbasi Yadkori… - Advances in …, 2024 - proceedings.neurips.cc

We consider a contextual bandit problem with $ S $ contexts and $ K $ actions. In each
round $ t= 1, 2,\dots $ the learnerobserves a random context and chooses an action based …

被引用次数：2 相关文章所有 7 个版本

[PDF] mdpi.com

A Multi-Constraint Guidance and Maneuvering Penetration Strategy via Meta Deep Reinforcement Learning

S Zhao, J Zhu, W Bao, X Li, H Sun - Drones, 2023 - mdpi.com

In response to the issue of UAV escape guidance, this study proposed a unified intelligent
control strategy synthesizing optimal guidance and meta deep reinforcement learning …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Off-policy Evaluation with Deeply-abstracted States

M Hao, P Su, L Hu, Z Szabo, Q Zhao, C Shi - arXiv preprint arXiv …, 2024 - arxiv.org

Off-policy evaluation (OPE) is crucial for assessing a target policy's impact offline before its
deployment. However, achieving accurate OPE in large state spaces remains challenging …

[PDF] arxiv.org

Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space

X Cheng, B Chen, L Varga, Y Hu - arXiv preprint arXiv:2312.00727, 2023 - arxiv.org

This paper delves into the problem of safe reinforcement learning (RL) in a partially
observable environment with the aim of achieving safe-reachability objectives. In traditional …

Primal-Dual Spectral Representation for Off-policy Evaluation

Y Hu, T Chen, N Li, K Wang, B Dai - arXiv preprint arXiv:2410.17538, 2024 - arxiv.org

Off-policy evaluation (OPE) is one of the most fundamental problems in reinforcement
learning (RL) to estimate the expected long-term payoff of a given target policy with only …