Exploration vs exploitation with partially observable Gaussian autoregressive arms

[HTML][HTML] Intersecting reinforcement learning and deep factor methods for optimizing locality and globality in forecasting: A review

J Sousa, R Henriques - Engineering Applications of Artificial Intelligence, 2024 - Elsevier

Operational forecasting often requires predicting collections of related, multivariate time
series data that are high-dimensional in nature. This can be tackled by fitting a single …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

Continual learning as computationally constrained reinforcement learning

S Kumar, H Marklund, A Rao, Y Zhu, HJ Jeon… - arXiv preprint arXiv …, 2023 - arxiv.org

An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills
over a long lifetime could advance the frontier of artificial intelligence capabilities. The …

被引用次数：18 相关文章所有 2 个版本

[PDF] mlr.press

Nonstationary bandit learning via predictive sampling

Y Liu, B Van Roy, K Xu - International Conference on …, 2023 - proceedings.mlr.press

Thompson sampling has proven effective across a wide range of stationary bandit
environments. However, as we demonstrate in this paper, it can perform poorly when …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling

Z Zhu, Y Liu, X Kuang, B Van Roy - arXiv preprint arXiv:2310.07786, 2023 - arxiv.org

Real-world applications of contextual bandits often exhibit non-stationarity due to
seasonality, serendipity, and evolving social trends. While a number of non-stationary …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

A definition of non-stationary bandits

Y Liu, X Kuang, B Van Roy - arXiv preprint arXiv:2302.12202, 2023 - arxiv.org

Despite the subject of non-stationary bandit learning having attracted much recent attention,
we have yet to identify a formal definition of non-stationarity that can consistently distinguish …

被引用次数：5 相关文章所有 2 个版本

[PDF] jmlr.org

Optimal policies for observing time series and related restless bandit problems

CR Dance, T Silander - Journal of Machine Learning Research, 2019 - jmlr.org

The trade-off between the cost of acquiring and processing data, and uncertainty due to a
lack of data is fundamental in machine learning. A basic instance of this trade-off is the …

被引用次数：14 相关文章所有 4 个版本

On local vs. population-based heuristics for ground station scheduling

A Lala, V Kolici, F Xhafa, X Herrero… - … on Complex, Intelligent …, 2015 - ieeexplore.ieee.org

Finding an optimal solution is computationally hard for most combinatorial optimization
problems. Therefore the use of heuristics methods aims at finding, if not optimal, near …

被引用次数：15 相关文章所有 4 个版本

[PDF] psu.edu

[PDF][PDF] Wireless channel selection with reward-observing restless multi-armed bandits

J Kuhn, Y Nazarathy - Chapter to appear in “Markov Decision Processes in …, 2015 - Citeseer

Wireless devices are often able to communicate on several alternative channels; for
example, cellular phones may use several frequency bands and are equipped with base …

被引用次数：10 相关文章

Efficient Deep Reinforcement Learning for Recommender Systems

Z Zhu - 2023 - search.proquest.com

Current recommender systems predominantly employ supervised learning algorithms, which
often fail to optimize for long-term user engagement. This short-sighted approach highlights …

[PDF] academia.edu

Wireless channel selection with restless bandits

J Kuhn, Y Nazarathy - Markov Decision Processes in Practice, 2017 - Springer

Wireless devices are often able to communicate on several alternative channels; for
example, cellular phones may use several frequency bands and are equipped with base …

被引用次数：3 相关文章所有 9 个版本