0% Complete
صفحه اصلی
/
سی و یکمین کنفرانس بین المللی مهندسی برق
Vision Transformer and Parallel Convolutional Neural Network for Speech Emotion Recognition
نویسندگان :
Saber Hashemi
1
Mohammad Asgari
2
1- دانشگاه صداوسیما
2- دانشگاه صدا و سیما
کلمات کلیدی :
speech emotion recognition،vision transformer،convolutional neural network،attention mechanism
چکیده :
Vision transformer (ViT) is a new approach for image processing tasks. The vision transformer splits the image into patches and converts it into a sequence of vectors. This sequence is suitable for the transformer structure. This paper uses the ViT method for speech emotion recognition. Unlike ViT, which splits the image into square patches, we use time frames as patches. Alongside using the frame-based ViT to benefit from its ability to learn global features, we are using a convolutional neural network. The convolutional neural network extracts local features and focuses on the two-dimensional structure of the input. Mel-Frequency Cepstral Coefficients extracted from audio files are used as input for the proposed neural network. Using this model in the RAVDESS dataset, we achieved an unweighted accuracy of 79.2%.
لیست مقالات
لیست مقالات بایگانی شده
An Ensemble Model for Sleep Stages Classification
Sahar Hassanzadeh Mostafaei - Jafar Tanha - Amir Sharafkhaneh - Zohair Hassanzadeh Mostafaei - Mohammed Hussein Ali Al-jaf - Alireza Fakhim babaei
Swin Wavelet Super Resolution
Zahra Moammeri - Ahmad Mahmoudi-Aznaveh
Surface roughness classification in dynamic touch using EEG signals
Ali Amini - Karim Faez - Mahmood Amiri
Brain Effective Connectivity Comparision in Different States of Familiarity and Desiring Brands Confrontation: a Neuromarketing Study
Mahdi Taghaddossi - Mohammad Hasan Moradi
Low VHF Wire Antenna with Low-cost and Wideband Properties
Mahdieh Bozorgi - Mahmood Rafaei-booket - Sina Hasibi-Taheri
Bit Error Rate Analysis for a Mixed Underwater OWC-FSO Relaying System in the Presence of Pointing Error
Mahdis Saghaee Jahed - Meysam Ghanbari - Seyed Mohammad Sajad Sadough
Multi-Bit Memory Architecture for In-memory Computing using In-Plane MTJ
Milad Ashtari Gargari - Nima Eslami - Mohammad Hossein Moaiyeri
Novel Wideband Dual-Polarized Base-Station Antenna
Farzad Alizadeh - Changiz Ghobadi - Javad Nourinia - Keyhan Hosseini - Bahman Mohammadi
Instantaneous Blind Audio Source Separation Using Characteristic Function of Heavy-Tailed Distributions
Kamran Rajabi - Mohammadreza Hassannejad Bibalan - Neda Faraji
A brief review of methods for improving the performance of virtual synchronous generators under unbalnced conditions
Mohammad Hossein Mousavi - Hassan Moradi CheshmehBeigi
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 42.5.3