Adaptive ε-Greedy Exploration in Reinforcement Learning Based on Value Differences

M Tokic - Annual conference on artificial intelligence, 2010 - Springer
Abstract This paper presents “Value-Difference Based Exploration”(VDBE), a method for
balancing the exploration/exploitation dilemma inherent to reinforcement learning. The …

Real-time energy purchase optimization for a storage-integrated photovoltaic system by deep reinforcement learning

W Kolodziejczyk, I Zoltowska, P Cichosz - Control Engineering Practice, 2021 - Elsevier
The objective of this article is to minimize the cost of energy purchased on a real-time basis
for a storage-integrated photovoltaic (PV) system installed in a microgrid. Under non-linear …

Statistical inference for online decision making: In a contextual bandit setting

H Chen, W Lu, R Song - Journal of the American Statistical …, 2021 - Taylor & Francis
Online decision making problem requires us to make a sequence of decisions based on
incremental information. Common solutions often need to learn a reward model of different …

CARES: Context-aware trust estimation for realtime crowdsensing services in vehicular edge networks

SY Jang, SK Park, JH Cho, D Lee - ACM Transactions on Internet …, 2022 - dl.acm.org
The growing number of smart vehicles makes it possible to envision a crowdsensing service
where vehicles can share video data of their surroundings for seeking out traffic conditions …

Heuristic dynamic programming for mobile robot path planning based on Dyna approach

S Al Dabooni, D Wunsch - 2016 International Joint Conference …, 2016 - ieeexplore.ieee.org
This paper presents a direct heuristic dynamic programming (HDP) based on Dyna planning
(Dyna_HDP) for online model learning in a Markov decision process. This novel technique …

Road-reconstruction after multi-locational flooding in multi-agent deep RL with the consideration of human mobility-Case study: Western Japan flooding in 2018

S Joo, Y Ogawa, Y Sekimoto - International Journal of Disaster Risk …, 2022 - Elsevier
Record-breaking heavy rain occurred in Western Japan from June 28 to July 8, 2018. Many
roads in Hiroshima and Okayama Prefecture were disrupted simultaneously. The …

Self-regulation management in IoT infrastructure using machine learning

M Stepanova, O Eremin, A Proletarsky - Recent Innovations in Computing …, 2022 - Springer
The emergence of the IoT concept introduced new opportunities and challenges in creating
and functioning digital solutions that are IoT based. The IoT concept nature is a …

Scheduling for massive MIMO with hybrid precoding using contextual multi-armed bandits

WVF Mauricio, TF Maciel, A Klein… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
In this work we study different scheduling problems in the downlink of a Frequency Division
Duplex multiuser wireless system that employs a hybrid precoding antenna architecture for …

A reinforcement learning approach for task assignment in IoT distributed platform

O Eremin, M Stepanova - Cyber-Physical Systems: Digital Technologies …, 2021 - Springer
This chapter represents an adaptive method based on reinforcement learning for task
assignment in IoT distributed platform. The described experiments and results present the …

[PDF][PDF] Назначение заданий узлам распределенной системы платформы Интернета вещей на основе машинного обучения с подкреплением

МВ Степанова, ОЮ Ерёмин - Автоматизация процессов …, 2021 - researchgate.net
Аннотация В работе рассматриваются вопросы применения адаптивного подхода,
основанного на машинном обучении с подкреплением, для распределения …