0% Complete
صفحه اصلی
/
بیست و نهمین کنفرانس مهندسی برق ایران
PAVID-CVs: Persian Audio-Visual Database of CV syllables
نویسندگان :
Mahsa Hedayatipour
1
Yasser Shekofteh
2
Mohsen Ebrahimi Moghaddam
3
1- دانشگاه شهید بهشتی
2- دانشگاه شهید بهشتی
3- دانشگاه شهید بهشتی
کلمات کلیدی :
Visual Speech Recognition, Lip Reading, CV syllables, Visyllable, Audio-Visual Database, Persian/Farsi Language.
چکیده :
Abstract— Lip-reading is a visual speech recognition process. In this process, recognizing the smaller units of speech can be the basis for recognizing the larger units of a language such as words. In this paper, we have introduced a Persian (Farsi) Audio-Visual Database of CV syllables, named PAVID-CVs, as a set of isolated two-phoneme visyllable and isolated words related to the visyllables, which include only Persian CV syllables, for lip-reading or audio-visual speech recognition purposes such as isolated word recognition. This dataset can be used for machine learning-based methods due to its useful tagged information. Here, we explain the steps of preparing the database. It contains about 30 hours data from 40 speakers. Initial experiments are done utilizing hidden Markov models (HMM) as a visyllable classifier. Then, these models have been used for visual recognition of 6 Persian words with different numbers of syllables and an accuracy of 47.37% was obtained in a speaker-independent experiment.
لیست مقالات
لیست مقالات بایگانی شده
بهبود تخصیص منابع لبهها در شبکه LTE مبتنی بر محاسبات لبه با رویکرد تعادل میان تاخیر و قابلیت اطمینان
ایمان عظیمی احمدآبادی - علی اکبر تدین تفت
مدل سازی ریزالور دو درجه آزادی خطی با استفاده از تابع سیمپیچی اصلاح شده
فرید توتونچیان - رضا فریادرس
Forged Channel: A Breakthrough Approach for Accurate Parkinson's Disease Classification using Leave-One-Subject-Out Cross-Validation
SeyedAmirReza Hamidi - Kamal Mohamed-Pour - Mohammad Yousefi
Design and Implementation of a fast flexible and efficient multichannel digital filter for hearing aids
Mohammadsadegh Poushnegar - Mahmoud Tabandeh - Meysam Nesary Moghadam - Farzam Gilani - Ali Aghakasiri
Empirical Performance Analysis and Channel Modeling of UAV-Assisted LoRa Networks
Hossein Aghajari - Sajad Ahmadinabi - Hamed Bakhtiari babadegani - Mehdi Naderi soorki
Improving CycleGAN-VC2 Voice Conversion by Learning MCD-Based Evaluation and Optimization
Majid Behdad - Davood Gharavian
مدلسازی محدودیت های عملی سیستم های ترکیبی انرژی الکتریکی- حرارتی با استفاده از تبدیلات پیشرفته برنامهریزی ریاضی
ریحانه حسن آبادی - حسین شریف زاده
Improving Quarter-Wavelength Resonator Technique for Parasitic Cancellation of the ESD Protection Diode for High-Frequency Applications
Emadodin Zia Khodadadian - Mojtaba Joodaki
{High performance detector for massive MIMO systems using an adaptive filering approach
Masoud Tahmasbi Fard - Mojtaba Amiri - Ali Olfat
Design and Electromagnetic Analysis of Brushless Salient Pole Switching Flux Synchronous Generator with DC Auxiliary Field Winding for Wind Energy Converter Systems
Seyed Hamed Bibak - Mohammad Hossein Mousavi - Moslem Geravandi
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 42.8.0