0% Complete
صفحه اصلی
/
سی و یکمین کنفرانس بین المللی مهندسی برق
Vision Transformer and Parallel Convolutional Neural Network for Speech Emotion Recognition
نویسندگان :
Saber Hashemi
1
Mohammad Asgari
2
1- دانشگاه صداوسیما
2- دانشگاه صدا و سیما
کلمات کلیدی :
speech emotion recognition،vision transformer،convolutional neural network،attention mechanism
چکیده :
Vision transformer (ViT) is a new approach for image processing tasks. The vision transformer splits the image into patches and converts it into a sequence of vectors. This sequence is suitable for the transformer structure. This paper uses the ViT method for speech emotion recognition. Unlike ViT, which splits the image into square patches, we use time frames as patches. Alongside using the frame-based ViT to benefit from its ability to learn global features, we are using a convolutional neural network. The convolutional neural network extracts local features and focuses on the two-dimensional structure of the input. Mel-Frequency Cepstral Coefficients extracted from audio files are used as input for the proposed neural network. Using this model in the RAVDESS dataset, we achieved an unweighted accuracy of 79.2%.
لیست مقالات
لیست مقالات بایگانی شده
An Investigation on Transfer Learning for Classification of COVID-19 Chest X-Ray Images with Pre-trained Convolutional-based Architecture
Mobina Abdoli Nemati - ََAmirreza Baba Ahmadi
System Sectioning to Retain Durability of an Inverter-Based Microgrid
Sara Noorollah
A Novel Step-up Converter Based on Active Network and Coupled-Inductor Technique with Soft Switching Operation
Mohammadreza Zeynalhosseyni - Reza Beiranvand
Temporal Green's function of an RLC resonator with arbitrary time-varying capacitance using differential transition matrix
Somayeh Boshgazi - Khashayar Mehrany - Mohammad Memarian
Privacy-Preserving Model Predictive Control Using Secure Multi-Party Computation
Saeed Adelipour - Mohammad Haeri
ساخت حسگر مقاومتی بخار اتانول مبتنی بر هتروساختار باریم تیتانات / اکسید روی آلاییده با نانوذرات نقره
محسن طاهری پور - نوید یثربی - شیرین نصراصفهانی - محمد حسین شیخی
A Transformer-Based Model for Similar Fashion Image Retrieval with Image and Text Features
Zahra Sheykhvand - Milad Farzalizadeh - Majid Meghdadi
Full Soft Switching Interleaved High Voltage Gain Converter For Renewable Energy Systems
Baharak Akhlaghi
Finite-time consensus of multi-agent systems via event-triggered control
Mehdi Zamanian - Farzaneh Abdollahi - Seyyed Kamaleddin Yadavar Nikravesh
تحلیل ارتباطات موثر و عملکردی سیگنالهای فیزیولوژیکی راننده جهت بهبود تشخیص حواس پرتی
نیلوفر وثوق - زهرا بهمنی دهکردی - امین محمدیان
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 42.8.0