Large Language Models as Agents in Two-Player Games

Y Liu, P Sun, H Li - arXiv preprint arXiv:2402.08078, 2024 - arxiv.org
By formally defining the training processes of large language models (LLMs), which usually
encompasses pre-training, supervised fine-tuning, and reinforcement learning with human …

Semi-supervised batch learning from logged data

G Aminian, A Behnamnia, R Vega, L Toni, C Shi… - openreview.net
Offline policy learning methods are intended to learn a policy from logged data, which
includes context, action, and reward for each sample point. In this work we build on the …

Batch Learning via Log-Sum-Exponential Estimator from Logged Bandit Feedback

A Behnamnia, G Aminian, A Aghaei, C Shi… - ICML 2024 Workshop … - openreview.net
Offline policy learning methods in batch learning aim to derive a policy from a logged bandit
feedback dataset, encompassing context, action, propensity score and feedback for each …