کنفرانس مهندسی برق ایران

صفحه اصلی / سی و سومین کنفرانس بین المللی مهندسی برق

Better Exploration In Single-Agent Q-Learning Using Controlled Linear Perturbation

نویسندگان :

Sadredin Hokmi¹ Mohammad Haeri²

1- Sharif university of technology 2- Sharif university of technology

کلمات کلیدی :

Q-learning،Exploration،Controlled Linear perturbation،Convergence rate،Maze،Cart-Pole

چکیده :

Reinforcement learning algorithms, especially model-free algorithms like Q-learning, have shown reliable results in finding optimal solutions for many real-time applications. However, challenges such as exploration in real-time and the convergence rate need to be addressed, and many researches have proposed algorithms to tackle these challenges. Algorithms like speedy Q-learning, Zap Q-learning, algorithms based on adding a regularization term, noise injection, and many others have been introduced. In this paper, an algorithm based on controlled linear perturbation is presented, which, according to the numerical results, can significantly reduce unnecessary explorations that are risky in real-time. Additionally, the proposed algorithm does not depend on the learning rate \mathbit{\alpha}, \mathbit{\gamma}, or changes in coefficients. However, to be effective, the parameters of the algorithm should be chosen within the correct range. The results of applying the proposed algorithm have been compared with three reliable algorithms: standard Q-learning, speedy Q-learning, and noise injection. These comparisons were conducted in a 9x9 maze scenario and in the cart-pole environment.

لیست مقالات

لیست مقالات بایگانی شده

Effects of Derating Factor and Minimum Short Circuit Current on the BOP Cable Sizing of a Power Plant

Hossein Zamanpour abyaneh

A New High Voltage Gain Z-Source Based DC-DC Converter for High-Power DG Applications

Sakina Bakhshi - Reza Beiranvand

A Communication-Aware Scheduler for Containers in a Kubernetes Environment Using Girvan-Newman Clustering

Marzie Norouzi Dehnashi - Mahmoud Momtazpour - Seyyed Ahmad Javadi

The Theory of A Novel Circuit Design For An Identification Processor Using 0.35µm CMOS Technology

Rouhollah Mohammadinasr - Kheirollah Hadidi - Farhad Piri

بهبود تخمین واریانس نویز با بهره گیری از واریانس تغییرات سیگنال

مجید دهقانیزاده - مسعودرضا آقابزرگی

تحلیل ارتباطات موثر و عملکردی سیگنال‌های فیزیولوژیکی راننده جهت بهبود تشخیص حواس پرتی

نیلوفر وثوق - زهرا بهمنی دهکردی - امین محمدیان

Mapping Human Grasping to 3-Finger Grippers: A Deep Learning Perspective

Fatemeh Naeinian - Elnaz Balazadeh - Mehdi Tale Masouleh

A Hybrid Computer-aided Diagnosis System For Central Obesity Screening In A Large Sample Of Iranian Children and Adolescents

Amirhossein Koochekian - Morteza Farahi - Hamid Reza Sadr manouchehri Naeini - Mohammad Reza Mohebian - Hamid Reza Marateb - Marjan Mansourian - Roya Kelishadi

Performance improvement of automated parking by considering road incline and wheel slippage

Ali Anisi - Moosa Ayati - Yassin Riyazi - Ali Asadian

Reactive Power Management of PV Systems by Distributed Cooperative Control in Low Voltage Distribution Networks

Saeed Mahdavian Rostami - Mohsen Hamzeh

ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 40.4.2