0% Complete
صفحه اصلی
/
سی و یکمین کنفرانس بین المللی مهندسی برق
Vision Transformer and Parallel Convolutional Neural Network for Speech Emotion Recognition
نویسندگان :
Saber Hashemi
1
Mohammad Asgari
2
1- دانشگاه صداوسیما
2- دانشگاه صدا و سیما
کلمات کلیدی :
speech emotion recognition،vision transformer،convolutional neural network،attention mechanism
چکیده :
Vision transformer (ViT) is a new approach for image processing tasks. The vision transformer splits the image into patches and converts it into a sequence of vectors. This sequence is suitable for the transformer structure. This paper uses the ViT method for speech emotion recognition. Unlike ViT, which splits the image into square patches, we use time frames as patches. Alongside using the frame-based ViT to benefit from its ability to learn global features, we are using a convolutional neural network. The convolutional neural network extracts local features and focuses on the two-dimensional structure of the input. Mel-Frequency Cepstral Coefficients extracted from audio files are used as input for the proposed neural network. Using this model in the RAVDESS dataset, we achieved an unweighted accuracy of 79.2%.
لیست مقالات
لیست مقالات بایگانی شده
Broadband Two Layers 1-Bit Metal-Only Transmitarray with Polarization Conversion Technique
Majid Karimipour - Iman Aryanian
An Active Inductor-Based Differential Ring VCO with Wide Tuning Range for UWB Applications
Mahdi Alijani - Mohammadmahdi Javanmardi - Vahid Khodadadi - Adib Abrishamifar
Model Predictive Control for a 3-DoF Suspended Cable Robot Based on Laguerre Functions
Shiva Khoshkam - Mohammad A. Khosravi - Rasul FesharakiFard
Accurate Methods for Automatic Detection of Characteristic Points in Electrocardiograms
Seyedeh Mersedeh Bagheri - Mohammad Pooyan
Exploring Graph Biomarkers and Connectivity in Epilepsy Through Graph Learning
Ali Khosravipour - Sepideh Hajipour Sardouie
بهرهگیری از رویکرد برنامهریزی ریاضیاتی برای حل مسئلهی مجموعه رأس بازخورد، تحت شرط مستقل بودن یا همبندی
فاطمه سلطانی دزکی - حسین فلسفین
The Comparison of MXene and Graphene-Based Antennas for 5G/6G Communications
Javad Shokri Seyyedi - Gholamreza Moradi - Reza Sarraf Shirazi - Sepehr Sahab - Abolfazl Ebrahimpour
Distributed Data Processing for Multi-Agent Systems Via Wave Model
Saeedreza Tofighi - Masoud Shafiee
A New 1-D Model for Singular 2-D Systems
Kamyar Azarakhsh - Masoud Shafiee
Model Predictive Control for Interconnected Systems with Communication Delays
Reza Mohammadikia - Mahsan Tavakoli-Kakhki
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 40.4.2