Inverse preference learning: Preference-based rl without a reward function

J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …

Chipformer: Transferable chip placement via offline decision transformer

Y Lai, J Liu, Z Tang, B Wang, J Hao… - … on Machine Learning, 2023 - proceedings.mlr.press
Placement is a critical step in modern chip design, aiming to determine the positions of
circuit modules on the chip canvas. Recent works have shown that reinforcement learning …

Contrastive prefence learning: Learning from human feedback without rl

J Hejna, R Rafailov, H Sikchi, C Finn, S Niekum… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular
paradigm for aligning models with human intent. Typically RLHF algorithms operate in two …

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

Ceil: Generalized contextual imitation learning

J Liu, L He, Y Kang, Z Zhuang… - Advances in Neural …, 2023 - proceedings.neurips.cc
In this paper, we present ContExtual Imitation Learning (CEIL), a general and broadly
applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight …

Design from policies: Conservative test-time adaptation for offline policy optimization

J Liu, H Zhang, Z Zhuang, Y Kang… - Advances in Neural …, 2024 - proceedings.neurips.cc
In this work, we decouple the iterative bi-level offline RL (value estimation and policy
extraction) from the offline training phase, forming a non-iterative bi-level paradigm and …

Direct preference-based policy optimization without reward modeling

G An, J Lee, X Zuo, N Kosaka… - Advances in Neural …, 2023 - proceedings.neurips.cc
Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to
learn from preference, which is particularly useful when formulating a reward function is …

Beyond ood state actions: Supported cross-domain offline reinforcement learning

J Liu, Z Zhang, Z Wei, Z Zhuang, Y Kang… - Proceedings of the …, 2024 - ojs.aaai.org
Offline reinforcement learning (RL) aims to learn a policy using only pre-collected and fixed
data. Although avoiding the time-consuming online interactions in RL, it poses challenges …

Clue: Calibrated latent guidance for offline reinforcement learning

J Liu, L Zu, L He, D Wang - Conference on Robot Learning, 2023 - proceedings.mlr.press
Offline reinforcement learning (RL) aims to learn an optimal policy from pre-collected and
labeled datasets, which eliminates the time-consuming data collection in online RL …

Flow to better: Offline preference-based reinforcement learning via preferred trajectory generation

Z Zhang, Y Sun, J Ye, TS Liu, J Zhang… - The Twelfth International …, 2023 - openreview.net
Offline preference-based reinforcement learning (PbRL) offers an effective solution to
overcome the challenges associated with designing rewards and the high costs of online …