On the opportunities and challenges of offline reinforcement learning for recommender systems

X Chen, S Wang, J McAuley, D Jannach… - ACM Transactions on …, 2024 - dl.acm.org
Reinforcement learning serves as a potent tool for modeling dynamic user interests within
recommender systems, garnering increasing research attention of late. However, a …

Ceil: Generalized contextual imitation learning

J Liu, L He, Y Kang, Z Zhuang… - Advances in Neural …, 2023 - proceedings.neurips.cc
In this paper, we present ContExtual Imitation Learning (CEIL), a general and broadly
applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight …

Design from policies: Conservative test-time adaptation for offline policy optimization

J Liu, H Zhang, Z Zhuang, Y Kang… - Advances in Neural …, 2024 - proceedings.neurips.cc
In this work, we decouple the iterative bi-level offline RL (value estimation and policy
extraction) from the offline training phase, forming a non-iterative bi-level paradigm and …

Clue: Calibrated latent guidance for offline reinforcement learning

J Liu, L Zu, L He, D Wang - Conference on Robot Learning, 2023 - proceedings.mlr.press
Offline reinforcement learning (RL) aims to learn an optimal policy from pre-collected and
labeled datasets, which eliminates the time-consuming data collection in online RL …

Reinformer: Max-return sequence modeling for offline rl

Z Zhuang, D Peng, J Liu, Z Zhang, D Wang - arXiv preprint arXiv …, 2024 - arxiv.org
As a data-driven paradigm, offline reinforcement learning (RL) has been formulated as
sequence modeling that conditions on the hindsight information including returns, goal or …

OER: Offline Experience Replay for Continual Offline Reinforcement Learning

S Gai, D Wang, L He - arXiv preprint arXiv:2305.13804, 2023 - arxiv.org
The capability of continuously learning new skills via a sequence of pre-collected offline
datasets is desired for an agent. However, consecutively learning a sequence of offline tasks …

DIDI: Diffusion-Guided Diversity for Offline Behavioral Generation

J Liu, X Guo, Z Zhuang, D Wang - arXiv preprint arXiv:2405.14790, 2024 - arxiv.org
In this paper, we propose a novel approach called DIffusion-guided DIversity (DIDI) for
offline behavioral generation. The goal of DIDI is to learn a diverse set of skills from a …

Enhancing Autonomous Lane-Changing Safety: Deep Reinforcement Learning via Pre-Exploration in Parallel Imaginary Environments

Z Hu, F Yang, Z Lu, J Chen - IEEE Transactions on Industrial …, 2024 - ieeexplore.ieee.org
The connected and autonomous vehicles combined with deep reinforcement learning (DRL)
are capable of handling complex driving scenarios. However, due to the random exploration …

ODRL: A Benchmark for Off-Dynamics Reinforcement Learning

J Lyu, K Xu, J Xu, M Yan, J Yang, Z Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
We consider off-dynamics reinforcement learning (RL) where one needs to transfer policies
across different domains with dynamics mismatch. Despite the focus on developing …

SAD: State-Action Distillation for In-Context Reinforcement Learning under Random Policies

W Chen, S Paternain - arXiv preprint arXiv:2410.19982, 2024 - arxiv.org
Pretrained foundation models have exhibited extraordinary in-context learning performance,
allowing zero-shot generalization to new tasks not encountered during the pretraining. In the …