0% Complete
صفحه اصلی
/
سی و سومین کنفرانس بین المللی مهندسی برق
Better Exploration In Single-Agent Q-Learning Using Controlled Linear Perturbation
نویسندگان :
Sadredin Hokmi
1
Mohammad Haeri
2
1- Sharif university of technology
2- Sharif university of technology
کلمات کلیدی :
Q-learning،Exploration،Controlled Linear perturbation،Convergence rate،Maze،Cart-Pole
چکیده :
Reinforcement learning algorithms, especially model-free algorithms like Q-learning, have shown reliable results in finding optimal solutions for many real-time applications. However, challenges such as exploration in real-time and the convergence rate need to be addressed, and many researches have proposed algorithms to tackle these challenges. Algorithms like speedy Q-learning, Zap Q-learning, algorithms based on adding a regularization term, noise injection, and many others have been introduced. In this paper, an algorithm based on controlled linear perturbation is presented, which, according to the numerical results, can significantly reduce unnecessary explorations that are risky in real-time. Additionally, the proposed algorithm does not depend on the learning rate \mathbit{\alpha}, \mathbit{\gamma}, or changes in coefficients. However, to be effective, the parameters of the algorithm should be chosen within the correct range. The results of applying the proposed algorithm have been compared with three reliable algorithms: standard Q-learning, speedy Q-learning, and noise injection. These comparisons were conducted in a 9x9 maze scenario and in the cart-pole environment.
لیست مقالات
لیست مقالات بایگانی شده
Multiswarm Binary Butterfly Optimization Algorithm for Solving the Multidimensional Knapsack Problem
Shakiba Shahbandegan - Madjid Naderi
Modeling and control of two PPR cooperative manipulations with a passive joint
Hassan Khosravi - Farhad Fani Saberi - Rasul Fesharakifard
Investigating Validity and Reliability of The Features Extracted by a 5R Vertical Robot for Arm Motion and Learning Assessment
Sarvenaz Bourbour - Fariba Bahrami Boodelalou - Ghorban Taghizadeh
مقایسهگر پویا با قابلیت کار در شرایط زیر آستانه بر اساس منطق Pseudo-NMOS
سید سعید حسینی دولت آبادی - محسن جلالی
یک روش مستقل از پارامترهای خطا بهمنظور تشخیص، دستهبندی و تعیین سکشن خطا در سیستم انتقال چند ترمیناله بر اساس تبدیل موجک گسسته
احسان اکبری - عبدالرضا شیخ الاسلامی
Smart EV Charging in Residential Power Grids Considering Users’ Preferences
Mahya Shahshahani - Ali Moradi Amani - Mahdi Jalili
Design, Simulation, and fabrication of a compact dual-band GNSS antenna
Farnoosh Abbasi - Amir Saman Nooramin
A New Method on Failure Detection of Fixed and Moving Contacts of Circuit Breakers
Hassan Hamidi - Ali Asghar Razi Kazemi
A Novel Tunable LC Filter For Ultra High Frequency Applications
Davoud Razaghpour - Mir Majid Ghasemi - Amir Fathi
Refractive Index Sensor Based on Photonic Crystal Nanocavities
Mohammad Zargarzadeh - Mohammad Hasan Yavari - Mohammad Heydari - Mohammad Hasan Rezaei
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 42.0.4