0% Complete
صفحه اصلی
/
سی و یکمین کنفرانس بین المللی مهندسی برق
Vision Transformer and Parallel Convolutional Neural Network for Speech Emotion Recognition
نویسندگان :
Saber Hashemi
1
Mohammad Asgari
2
1- دانشگاه صداوسیما
2- دانشگاه صدا و سیما
کلمات کلیدی :
speech emotion recognition،vision transformer،convolutional neural network،attention mechanism
چکیده :
Vision transformer (ViT) is a new approach for image processing tasks. The vision transformer splits the image into patches and converts it into a sequence of vectors. This sequence is suitable for the transformer structure. This paper uses the ViT method for speech emotion recognition. Unlike ViT, which splits the image into square patches, we use time frames as patches. Alongside using the frame-based ViT to benefit from its ability to learn global features, we are using a convolutional neural network. The convolutional neural network extracts local features and focuses on the two-dimensional structure of the input. Mel-Frequency Cepstral Coefficients extracted from audio files are used as input for the proposed neural network. Using this model in the RAVDESS dataset, we achieved an unweighted accuracy of 79.2%.
لیست مقالات
لیست مقالات بایگانی شده
Design and Analysis of Concentrated Field TFPM Generator for Direct-Drive Wind Turbines
Maryam Salehi - Ahmad Darabi - Aghil Ghaheri - Mohammad Hoseintabar Marzebali
A Novel Generation Shedding Procedure for Power Management System in Industrial Power Plants
Erfan Asadi - Hamid Khoshkhoo - Ali Parizad
Evaluation of Blood Bilirubin via Visible Light Waves
Reyhane Zarei - Mousa Shamsi - Amin Eidi
A Decentralized Nonlinear Control Strategy for a Robust Voltage Regulation in Islanded DC Microgrids with ZIP Loads
Somayeh Bahrami
A Novel Image Denoising Algorithm Based on Wavelet and Akamatsu Transforms Using Particle Swarm Optimization
Zeinab Pakdaman - Majid Amini-Valashani - Sattar Mirzakuchaki
Design and fabrication of a microstrip phase shifter based on liquid crystal
Sadegh Rajabi Doulataabadi - Seyed Hossein Hosseini Biuki - Farid Khoshkhati - Seyed Abbas Jazayeri Moghadas - Mohammad Masoudi Mohammadi - Mehdi Ahmadi-Boroujeni
Super twisting sliding mode incorporated with USDE for tracking control of nonlinear robotic systems
Ahmadreza Fallahinezhad - Maryam Malekzadeh - Alireza Ariaei
تخمین افسردگی مبتنی بر صوت با استفاده از بانک فیلتر و شبکه عصبی ResNet
علی نیک خراسانی - محمدرضا اکبرزاده توتونچی - مجید غیورمبرهن
Switchable Chain Configurable RO PUF for Enhancing Hardware Security of IoT Devices
Niloufar Sayadi - Mohammad Eshghi
بخشبندی خودکار تصاویر تشدید مغناطیسی ستون فقرات کمری با شبکه سِگیونِت
محمد انصاری فرد - رضا آقایی زاده ظروفی
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 42.8.0