Representation learning for online and offline rl in low-rank mdps
This work studies the question of Representation Learning in RL: how can we learn a
compact low-dimensional representation such that on top of the representation we can …
compact low-dimensional representation such that on top of the representation we can …
Pessimistic model-based offline reinforcement learning under partial coverage
We study model-based offline Reinforcement Learning with general function approximation
without a full coverage assumption on the offline data distribution. We present an algorithm …
without a full coverage assumption on the offline data distribution. We present an algorithm …
Efficient reinforcement learning in block mdps: A model-free representation learning approach
We present BRIEE, an algorithm for efficient reinforcement learning in Markov Decision
Processes with block-structured dynamics (ie, Block MDPs), where rich observations are …
Processes with block-structured dynamics (ie, Block MDPs), where rich observations are …
Learning bellman complete representations for offline policy evaluation
We study representation learning for Offline Reinforcement Learning (RL), focusing on the
important task of Offline Policy Evaluation (OPE). Recent work shows that, in contrast to …
important task of Offline Policy Evaluation (OPE). Recent work shows that, in contrast to …
Shapley meets uniform: An axiomatic framework for attribution in online advertising
One of the central challenges in online advertising is attribution, namely, assessing the
contribution of individual advertiser actions including emails, display ads and search ads to …
contribution of individual advertiser actions including emails, display ads and search ads to …
Context-lumpable stochastic bandits
We consider a contextual bandit problem with $ S $ contexts and $ K $ actions. In each
round $ t= 1, 2,\dots $ the learnerobserves a random context and chooses an action based …
round $ t= 1, 2,\dots $ the learnerobserves a random context and chooses an action based …
A Multi-Constraint Guidance and Maneuvering Penetration Strategy via Meta Deep Reinforcement Learning
S Zhao, J Zhu, W Bao, X Li, H Sun - Drones, 2023 - mdpi.com
In response to the issue of UAV escape guidance, this study proposed a unified intelligent
control strategy synthesizing optimal guidance and meta deep reinforcement learning …
control strategy synthesizing optimal guidance and meta deep reinforcement learning …
Off-policy Evaluation with Deeply-abstracted States
Off-policy evaluation (OPE) is crucial for assessing a target policy's impact offline before its
deployment. However, achieving accurate OPE in large state spaces remains challenging …
deployment. However, achieving accurate OPE in large state spaces remains challenging …
Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space
This paper delves into the problem of safe reinforcement learning (RL) in a partially
observable environment with the aim of achieving safe-reachability objectives. In traditional …
observable environment with the aim of achieving safe-reachability objectives. In traditional …
Primal-Dual Spectral Representation for Off-policy Evaluation
Off-policy evaluation (OPE) is one of the most fundamental problems in reinforcement
learning (RL) to estimate the expected long-term payoff of a given target policy with only …
learning (RL) to estimate the expected long-term payoff of a given target policy with only …