Improving the exploration strategy in bandit algorithms

M Tokic - Annual conference on artificial intelligence, 2010 - Springer

Abstract This paper presents “Value-Difference Based Exploration”(VDBE), a method for
balancing the exploration/exploitation dilemma inherent to reinforcement learning. The …

被引用次数：482 相关文章所有 13 个版本

Real-time energy purchase optimization for a storage-integrated photovoltaic system by deep reinforcement learning

W Kolodziejczyk, I Zoltowska, P Cichosz - Control Engineering Practice, 2021 - Elsevier

The objective of this article is to minimize the cost of energy purchased on a real-time basis
for a storage-integrated photovoltaic (PV) system installed in a microgrid. Under non-linear …

被引用次数：37 相关文章所有 3 个版本

[HTML] nih.gov

Statistical inference for online decision making: In a contextual bandit setting

H Chen, W Lu, R Song - Journal of the American Statistical …, 2021 - Taylor & Francis

Online decision making problem requires us to make a sequence of decisions based on
incremental information. Common solutions often need to learn a reward model of different …

被引用次数：48 相关文章所有 11 个版本

[PDF] vt.edu

CARES: Context-aware trust estimation for realtime crowdsensing services in vehicular edge networks

SY Jang, SK Park, JH Cho, D Lee - ACM Transactions on Internet …, 2022 - dl.acm.org

The growing number of smart vehicles makes it possible to envision a crowdsensing service
where vehicles can share video data of their surroundings for seeking out traffic conditions …

被引用次数：6 相关文章所有 4 个版本

Heuristic dynamic programming for mobile robot path planning based on Dyna approach

S Al Dabooni, D Wunsch - 2016 International Joint Conference …, 2016 - ieeexplore.ieee.org

This paper presents a direct heuristic dynamic programming (HDP) based on Dyna planning
(Dyna_HDP) for online model learning in a Markov decision process. This novel technique …

被引用次数：29 相关文章所有 2 个版本

[PDF] sciencedirect.com

Road-reconstruction after multi-locational flooding in multi-agent deep RL with the consideration of human mobility-Case study: Western Japan flooding in 2018

S Joo, Y Ogawa, Y Sekimoto - International Journal of Disaster Risk …, 2022 - Elsevier

Record-breaking heavy rain occurred in Western Japan from June 28 to July 8, 2018. Many
roads in Hiroshima and Okayama Prefecture were disrupted simultaneously. The …

被引用次数：11 相关文章所有 4 个版本

Self-regulation management in IoT infrastructure using machine learning

M Stepanova, O Eremin, A Proletarsky - Recent Innovations in Computing …, 2022 - Springer

The emergence of the IoT concept introduced new opportunities and challenges in creating
and functioning digital solutions that are IoT based. The IoT concept nature is a …

被引用次数：7 相关文章所有 4 个版本

[PDF] tu-darmstadt.de

Scheduling for massive MIMO with hybrid precoding using contextual multi-armed bandits

WVF Mauricio, TF Maciel, A Klein… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

In this work we study different scheduling problems in the downlink of a Frequency Division
Duplex multiuser wireless system that employs a hybrid precoding antenna architecture for …

被引用次数：5 相关文章所有 2 个版本

A reinforcement learning approach for task assignment in IoT distributed platform

O Eremin, M Stepanova - Cyber-Physical Systems: Digital Technologies …, 2021 - Springer

This chapter represents an adaptive method based on reinforcement learning for task
assignment in IoT distributed platform. The described experiments and results present the …

被引用次数：7 相关文章所有 4 个版本

[PDF] researchgate.net

[PDF][PDF] Назначение заданий узлам распределенной системы платформы Интернета вещей на основе машинного обучения с подкреплением

МВ Степанова, ОЮ Ерёмин - Автоматизация процессов …, 2021 - researchgate.net

Аннотация В работе рассматриваются вопросы применения адаптивного подхода,
основанного на машинном обучении с подкреплением, для распределения …

被引用次数：7 相关文章所有 3 个版本