[HTML][HTML] Intersecting reinforcement learning and deep factor methods for optimizing locality and globality in forecasting: A review

J Sousa, R Henriques - Engineering Applications of Artificial Intelligence, 2024 - Elsevier
Operational forecasting often requires predicting collections of related, multivariate time
series data that are high-dimensional in nature. This can be tackled by fitting a single …

Continual learning as computationally constrained reinforcement learning

S Kumar, H Marklund, A Rao, Y Zhu, HJ Jeon… - arXiv preprint arXiv …, 2023 - arxiv.org
An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills
over a long lifetime could advance the frontier of artificial intelligence capabilities. The …

Nonstationary bandit learning via predictive sampling

Y Liu, B Van Roy, K Xu - International Conference on …, 2023 - proceedings.mlr.press
Thompson sampling has proven effective across a wide range of stationary bandit
environments. However, as we demonstrate in this paper, it can perform poorly when …

Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling

Z Zhu, Y Liu, X Kuang, B Van Roy - arXiv preprint arXiv:2310.07786, 2023 - arxiv.org
Real-world applications of contextual bandits often exhibit non-stationarity due to
seasonality, serendipity, and evolving social trends. While a number of non-stationary …

A definition of non-stationary bandits

Y Liu, X Kuang, B Van Roy - arXiv preprint arXiv:2302.12202, 2023 - arxiv.org
Despite the subject of non-stationary bandit learning having attracted much recent attention,
we have yet to identify a formal definition of non-stationarity that can consistently distinguish …

Optimal policies for observing time series and related restless bandit problems

CR Dance, T Silander - Journal of Machine Learning Research, 2019 - jmlr.org
The trade-off between the cost of acquiring and processing data, and uncertainty due to a
lack of data is fundamental in machine learning. A basic instance of this trade-off is the …

On local vs. population-based heuristics for ground station scheduling

A Lala, V Kolici, F Xhafa, X Herrero… - … on Complex, Intelligent …, 2015 - ieeexplore.ieee.org
Finding an optimal solution is computationally hard for most combinatorial optimization
problems. Therefore the use of heuristics methods aims at finding, if not optimal, near …

[PDF][PDF] Wireless channel selection with reward-observing restless multi-armed bandits

J Kuhn, Y Nazarathy - Chapter to appear in “Markov Decision Processes in …, 2015 - Citeseer
Wireless devices are often able to communicate on several alternative channels; for
example, cellular phones may use several frequency bands and are equipped with base …

Efficient Deep Reinforcement Learning for Recommender Systems

Z Zhu - 2023 - search.proquest.com
Current recommender systems predominantly employ supervised learning algorithms, which
often fail to optimize for long-term user engagement. This short-sighted approach highlights …

Wireless channel selection with restless bandits

J Kuhn, Y Nazarathy - Markov Decision Processes in Practice, 2017 - Springer
Wireless devices are often able to communicate on several alternative channels; for
example, cellular phones may use several frequency bands and are equipped with base …