0% Complete
صفحه اصلی
/
بیست و نهمین کنفرانس مهندسی برق ایران
PAVID-CVs: Persian Audio-Visual Database of CV syllables
نویسندگان :
Mahsa Hedayatipour
1
Yasser Shekofteh
2
Mohsen Ebrahimi Moghaddam
3
1- دانشگاه شهید بهشتی
2- دانشگاه شهید بهشتی
3- دانشگاه شهید بهشتی
کلمات کلیدی :
Visual Speech Recognition, Lip Reading, CV syllables, Visyllable, Audio-Visual Database, Persian/Farsi Language.
چکیده :
Abstract— Lip-reading is a visual speech recognition process. In this process, recognizing the smaller units of speech can be the basis for recognizing the larger units of a language such as words. In this paper, we have introduced a Persian (Farsi) Audio-Visual Database of CV syllables, named PAVID-CVs, as a set of isolated two-phoneme visyllable and isolated words related to the visyllables, which include only Persian CV syllables, for lip-reading or audio-visual speech recognition purposes such as isolated word recognition. This dataset can be used for machine learning-based methods due to its useful tagged information. Here, we explain the steps of preparing the database. It contains about 30 hours data from 40 speakers. Initial experiments are done utilizing hidden Markov models (HMM) as a visyllable classifier. Then, these models have been used for visual recognition of 6 Persian words with different numbers of syllables and an accuracy of 47.37% was obtained in a speaker-independent experiment.
لیست مقالات
لیست مقالات بایگانی شده
Unscented Kalman Filter adaptive noise covariance selection for satellite formation flying with Q_learning
Mohammad Hossein Nemati - MohammadRasoul Kankashvar - Hossein Bolandi
پایدارسازی سیستم های چند ورودی-چند خروجی n-بعدی با استفاده از تساوی بزوت، تئوری متباین و نمایش کسری-ماتریسی سیستم
سعید پورمیرزایی - وحید صفری دهنوی - مسعود شفیعی
Exploring Different Machine Learning-based Methods for Learning the Language of Shepna Stock Price
Zoreh Ansari - Jalal Raeisi Gahruei - Mansoor Khademi
طراحی و مدلسازی امولاتور دریچه گاز الکترونیکی برای کاربرد در خودرو
محمدرضا درزی - مجید شالچیان
40Hz Auditory Entrainment Promotes Synchronization Between Frontal and Parietal Regions of the Brain
Mojtaba Lahijanian - Hamid Aghajan
On spatiotemporal-aware deep neural networks for real-time video fire detection: empowering image-based models with temporal and spatial features of video
Mahdi Shamisavi - Sahar Eslami - Amir Jahanshahi - Morteza Rajabzadeh
Decoding Trait: Using Dual Transformers to Analyze Gender, Age Range and Personality
ُSaeed Asadian - Mostafa Tanasan - Bijan Vosoughi vahdat
A 0.5-V Ultra-Low-Power Low-Pass-filter with Low Noise for ECG detection system
Yasin Heydarzadeh - Mehran Khanehbeygi - Sajad Sohrabian - Ziaddin Daie Koozehkanani
Alternative Detectors for Spectrum Sensing by Exploiting Excess Bandwidth
Sirvan Gharib - Abolfazl Falahati - Vahid Ahmadi
کدینگ فیبوناچی جهش یافته: ارائه یک روش برای افزایش قابلیت اطمینان در شبکههای روی تراشه سهبعدی
مجتبی فرمانی - سروین ناظر جعفری - زهرا شیرمحمدی
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 40.4.2