0% Complete
صفحه اصلی
/
سی و یکمین کنفرانس بین المللی مهندسی برق
Vision Transformer and Parallel Convolutional Neural Network for Speech Emotion Recognition
نویسندگان :
Saber Hashemi
1
Mohammad Asgari
2
1- دانشگاه صداوسیما
2- دانشگاه صدا و سیما
کلمات کلیدی :
speech emotion recognition،vision transformer،convolutional neural network،attention mechanism
چکیده :
Vision transformer (ViT) is a new approach for image processing tasks. The vision transformer splits the image into patches and converts it into a sequence of vectors. This sequence is suitable for the transformer structure. This paper uses the ViT method for speech emotion recognition. Unlike ViT, which splits the image into square patches, we use time frames as patches. Alongside using the frame-based ViT to benefit from its ability to learn global features, we are using a convolutional neural network. The convolutional neural network extracts local features and focuses on the two-dimensional structure of the input. Mel-Frequency Cepstral Coefficients extracted from audio files are used as input for the proposed neural network. Using this model in the RAVDESS dataset, we achieved an unweighted accuracy of 79.2%.
لیست مقالات
لیست مقالات بایگانی شده
A Two Stage Low Power 0.73-4.4 GHz LNA Using Current Reuse and Noise Reduction Techniques
Sajjad Shojaei Baghini - Seyed-Ali Samareh-TaheriNasab - Samad Sheikhaei
مدل سازی فشرده و شبیه سازی گذار عایق به هادی در افزاره مات مبتنی بر VO2
پرناز عباسی - مجید شالچیان
اصلاح مسیرخروجی ID FANتا دودکش اشکودا و امکان سنجی بازیابی حرارتی دود
یاشار مغمومی - فرشته صادقی
Temporary Goal Method: A Solution for the Problem of Getting Stuck in Motion Planning Algorithms
Danial Khan mohamad zade - Samaneh Hosseini Semnani
مکان یابی بهینه ذخیره سازهای متحرک انرژی الکتریکی با هدف بهبود تاب آوری سیستم توزیع قبل از طوفان
سبحان آقابابایی - محمد صادق سپاسیان
Adaptive Control of Telerehabilitation Systems in The Framework of Multi-Agent Systems
Mohammadreza Sheykh - Heidar Ali Talebi - ّIman Sharifi
Scene Understanding in Pick-and-Place Tasks: Analyzing Transformations Between Initial and Final Scenes
Seraj Ghasemi - Hamed Hosseini - MohammadHossein Koosheshi - Mehdi Tale Masouleh - Ahmad Kalhor
Dual Tapering Ultra-Wideband Vivaldi Antenna
Mojtaba Ahadi - Javad Nourinia - Changiz Ghoabdi - Rahim Naderali - Bahman Mohammadi
Experimental Study on Automatically Assembling Custom Catering Packages With a 3-DOF Delta Robot Using Deep Learning Methods
Reihaneh Yourdkhani - Arash Tavoosian - Navid Asadi Khomami - Mehdi Tale Masouleh
Control of vienna rectifier with Discontinuous space vector modulation based on circuit level decoupling
Ali Roshandel - Mohammad Roshandel - Ebrahim Afjei
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 40.4.2