On the opportunities and challenges of offline reinforcement learning for recommender systems
Reinforcement learning serves as a potent tool for modeling dynamic user interests within
recommender systems, garnering increasing research attention of late. However, a …
recommender systems, garnering increasing research attention of late. However, a …
Ceil: Generalized contextual imitation learning
In this paper, we present ContExtual Imitation Learning (CEIL), a general and broadly
applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight …
applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight …
Design from policies: Conservative test-time adaptation for offline policy optimization
In this work, we decouple the iterative bi-level offline RL (value estimation and policy
extraction) from the offline training phase, forming a non-iterative bi-level paradigm and …
extraction) from the offline training phase, forming a non-iterative bi-level paradigm and …
Clue: Calibrated latent guidance for offline reinforcement learning
Offline reinforcement learning (RL) aims to learn an optimal policy from pre-collected and
labeled datasets, which eliminates the time-consuming data collection in online RL …
labeled datasets, which eliminates the time-consuming data collection in online RL …
Reinformer: Max-return sequence modeling for offline rl
As a data-driven paradigm, offline reinforcement learning (RL) has been formulated as
sequence modeling that conditions on the hindsight information including returns, goal or …
sequence modeling that conditions on the hindsight information including returns, goal or …
OER: Offline Experience Replay for Continual Offline Reinforcement Learning
S Gai, D Wang, L He - arXiv preprint arXiv:2305.13804, 2023 - arxiv.org
The capability of continuously learning new skills via a sequence of pre-collected offline
datasets is desired for an agent. However, consecutively learning a sequence of offline tasks …
datasets is desired for an agent. However, consecutively learning a sequence of offline tasks …
DIDI: Diffusion-Guided Diversity for Offline Behavioral Generation
In this paper, we propose a novel approach called DIffusion-guided DIversity (DIDI) for
offline behavioral generation. The goal of DIDI is to learn a diverse set of skills from a …
offline behavioral generation. The goal of DIDI is to learn a diverse set of skills from a …
Enhancing Autonomous Lane-Changing Safety: Deep Reinforcement Learning via Pre-Exploration in Parallel Imaginary Environments
The connected and autonomous vehicles combined with deep reinforcement learning (DRL)
are capable of handling complex driving scenarios. However, due to the random exploration …
are capable of handling complex driving scenarios. However, due to the random exploration …
ODRL: A Benchmark for Off-Dynamics Reinforcement Learning
We consider off-dynamics reinforcement learning (RL) where one needs to transfer policies
across different domains with dynamics mismatch. Despite the focus on developing …
across different domains with dynamics mismatch. Despite the focus on developing …
SAD: State-Action Distillation for In-Context Reinforcement Learning under Random Policies
W Chen, S Paternain - arXiv preprint arXiv:2410.19982, 2024 - arxiv.org
Pretrained foundation models have exhibited extraordinary in-context learning performance,
allowing zero-shot generalization to new tasks not encountered during the pretraining. In the …
allowing zero-shot generalization to new tasks not encountered during the pretraining. In the …