Epistemic neural networks

I Osband, Z Wen, SM Asghari… - Advances in …, 2023 - proceedings.neurips.cc
Intelligence relies on an agent's knowledge of what it does not know. This capability can be
assessed based on the quality of joint predictions of labels across multiple inputs. In …

Reinforcement learning, bit by bit

X Lu, B Van Roy, V Dwaracherla… - … and Trends® in …, 2023 - nowpublishers.com
Reinforcement learning agents have demonstrated remarkable achievements in simulated
environments. Data efficiency poses an impediment to carrying this success over to real …

Posterior meta-replay for continual learning

C Henning, M Cervera, F D'Angelo… - Advances in neural …, 2021 - proceedings.neurips.cc
Learning a sequence of tasks without access to iid observations is a widely studied form of
continual learning (CL) that remains challenging. In principle, Bayesian learning directly …

Scalable neural contextual bandit for recommender systems

Z Zhu, B Van Roy - Proceedings of the 32nd ACM International …, 2023 - dl.acm.org
High-quality recommender systems ought to deliver both innovative and relevant content
through effective and exploratory interactions with users. Yet, supervised learning-based …

Controllable pareto multi-task learning

X Lin, Z Yang, Q Zhang, S Kwong - 2020 - openreview.net
A multi-task learning (MTL) system aims at solving multiple related tasks at the same time.
With a fixed model capacity, the tasks would be conflicted with each other, and the system …

Approximate thompson sampling via epistemic neural networks

I Osband, Z Wen, SM Asghari… - Uncertainty in …, 2023 - proceedings.mlr.press
Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling
from a posterior distribution. Unfortunately, this can become computationally intractable in …

Efficient exploration for llms

V Dwaracherla, SM Asghari, B Hao… - arXiv preprint arXiv …, 2024 - arxiv.org
We present evidence of substantial benefit from efficient exploration in gathering human
feedback to improve large language models. In our experiments, an agent sequentially …

Visual affordance prediction for guiding robot exploration

H Bharadhwaj, A Gupta… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Motivated by the intuitive understanding humans have about the space of possible
interactions, and the ease with which they can generalize this understanding to previously …

Uncertainty estimation for language reward models

A Gleave, G Irving - arXiv preprint arXiv:2203.07472, 2022 - arxiv.org
Language models can learn a range of capabilities from unsupervised training on text
corpora. However, to solve a particular problem (such as text summarization) it is typically …

Meta-learning via hypernetworks

D Zhao, S Kobayashi… - 4th Workshop on …, 2020 - research-collection.ethz.ch
Recent developments in few-shot learning have shown that during fast adaption, gradient-
based meta-learners mostly rely on embedding features of powerful pretrained networks …