0% Complete
صفحه اصلی
/
سی و یکمین کنفرانس بین المللی مهندسی برق
Vision Transformer and Parallel Convolutional Neural Network for Speech Emotion Recognition
نویسندگان :
Saber Hashemi
1
Mohammad Asgari
2
1- دانشگاه صداوسیما
2- دانشگاه صدا و سیما
کلمات کلیدی :
speech emotion recognition،vision transformer،convolutional neural network،attention mechanism
چکیده :
Vision transformer (ViT) is a new approach for image processing tasks. The vision transformer splits the image into patches and converts it into a sequence of vectors. This sequence is suitable for the transformer structure. This paper uses the ViT method for speech emotion recognition. Unlike ViT, which splits the image into square patches, we use time frames as patches. Alongside using the frame-based ViT to benefit from its ability to learn global features, we are using a convolutional neural network. The convolutional neural network extracts local features and focuses on the two-dimensional structure of the input. Mel-Frequency Cepstral Coefficients extracted from audio files are used as input for the proposed neural network. Using this model in the RAVDESS dataset, we achieved an unweighted accuracy of 79.2%.
لیست مقالات
لیست مقالات بایگانی شده
Medial Residual Encoder Layers for Classification of Brain Tumors in Magnetic Resonance Images
Zahra Sobhaninia - Nader Karimi - Pejman Khadivi - Shadrokh Samavi
ℒ1 Adaptive Control Design Using CMPC: Applied to Single-Link Flexible Joint Manipulator
Hossein Ahmadian - Heidar Ali Talebi - Iman Sharifi
Non-isolated Ultra-high Step-Up Quadratic Converter With Low Voltage Stress and Continuous Input Current
Maryam Hajilou - Baharak Akhlaghi - Hosein Farzanehfard
مدلسازی ریاضی و شبیه سازی پاندمی کووید 19در ایران
شبنم کوهستانی - نیلوفر مظفری - سید محمدرضا موسوی
ارائه یک مبدل DC-DC منبع امپدانسی تک سوئیچه تک هسته مغناطیسی فوق افزاینده مناسب برای استفاده در کاربرد های انرژی نو
معصومه پرستش - سجاد رستمی
Outage and Sum-Rate Analysis for mCAP-NOMA in Visible Light Communication Under Users' Mobility
Amir Oshtoudan - Seyed Mohammad Sajad Sadough
امکان سنجی نظری آشکارسازی گاز سولفید هیدروژن توسط سیلی گرافن (g-SiC2)
حامد مهدوی نژاد - رزا صفایی - محمدحسین شیخی
Fuzzy Fractional Order Sliding Mode Controller Design for a Wind Turbine with DFIG
Mohammad Hossein Aghaseyedabdollah - Yasin Alavian - Hadi Azmi - Alireza Yazdizadeh
A Modified Low Rank Learning Based on Iterative Nuclear Weighting in Ripplet Transform for Denoising MR Images
Nooshin Farhangian - Mansour Nejati Jahromi - Mahdi Nouri
Modeling of a low-noise amplifier with a recurrent neural network
Mostafa Noohi - Fatemeh Charoosaei - Ali Mirvakili - Sayed Alireza Sadrossadat
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 40.4.2