[HTML][HTML] Intersecting reinforcement learning and deep factor methods for optimizing locality and globality in forecasting: A review
J Sousa, R Henriques - Engineering Applications of Artificial Intelligence, 2024 - Elsevier
Operational forecasting often requires predicting collections of related, multivariate time
series data that are high-dimensional in nature. This can be tackled by fitting a single …
series data that are high-dimensional in nature. This can be tackled by fitting a single …
Continual learning as computationally constrained reinforcement learning
An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills
over a long lifetime could advance the frontier of artificial intelligence capabilities. The …
over a long lifetime could advance the frontier of artificial intelligence capabilities. The …
Nonstationary bandit learning via predictive sampling
Thompson sampling has proven effective across a wide range of stationary bandit
environments. However, as we demonstrate in this paper, it can perform poorly when …
environments. However, as we demonstrate in this paper, it can perform poorly when …
Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling
Real-world applications of contextual bandits often exhibit non-stationarity due to
seasonality, serendipity, and evolving social trends. While a number of non-stationary …
seasonality, serendipity, and evolving social trends. While a number of non-stationary …
A definition of non-stationary bandits
Despite the subject of non-stationary bandit learning having attracted much recent attention,
we have yet to identify a formal definition of non-stationarity that can consistently distinguish …
we have yet to identify a formal definition of non-stationarity that can consistently distinguish …
Optimal policies for observing time series and related restless bandit problems
CR Dance, T Silander - Journal of Machine Learning Research, 2019 - jmlr.org
The trade-off between the cost of acquiring and processing data, and uncertainty due to a
lack of data is fundamental in machine learning. A basic instance of this trade-off is the …
lack of data is fundamental in machine learning. A basic instance of this trade-off is the …
On local vs. population-based heuristics for ground station scheduling
Finding an optimal solution is computationally hard for most combinatorial optimization
problems. Therefore the use of heuristics methods aims at finding, if not optimal, near …
problems. Therefore the use of heuristics methods aims at finding, if not optimal, near …
[PDF][PDF] Wireless channel selection with reward-observing restless multi-armed bandits
J Kuhn, Y Nazarathy - Chapter to appear in “Markov Decision Processes in …, 2015 - Citeseer
Wireless devices are often able to communicate on several alternative channels; for
example, cellular phones may use several frequency bands and are equipped with base …
example, cellular phones may use several frequency bands and are equipped with base …
Efficient Deep Reinforcement Learning for Recommender Systems
Z Zhu - 2023 - search.proquest.com
Current recommender systems predominantly employ supervised learning algorithms, which
often fail to optimize for long-term user engagement. This short-sighted approach highlights …
often fail to optimize for long-term user engagement. This short-sighted approach highlights …
Wireless channel selection with restless bandits
J Kuhn, Y Nazarathy - Markov Decision Processes in Practice, 2017 - Springer
Wireless devices are often able to communicate on several alternative channels; for
example, cellular phones may use several frequency bands and are equipped with base …
example, cellular phones may use several frequency bands and are equipped with base …