0% Complete
صفحه اصلی
/
بیست و نهمین کنفرانس مهندسی برق ایران
PAVID-CVs: Persian Audio-Visual Database of CV syllables
نویسندگان :
Mahsa Hedayatipour
1
Yasser Shekofteh
2
Mohsen Ebrahimi Moghaddam
3
1- دانشگاه شهید بهشتی
2- دانشگاه شهید بهشتی
3- دانشگاه شهید بهشتی
کلمات کلیدی :
Visual Speech Recognition, Lip Reading, CV syllables, Visyllable, Audio-Visual Database, Persian/Farsi Language.
چکیده :
Abstract— Lip-reading is a visual speech recognition process. In this process, recognizing the smaller units of speech can be the basis for recognizing the larger units of a language such as words. In this paper, we have introduced a Persian (Farsi) Audio-Visual Database of CV syllables, named PAVID-CVs, as a set of isolated two-phoneme visyllable and isolated words related to the visyllables, which include only Persian CV syllables, for lip-reading or audio-visual speech recognition purposes such as isolated word recognition. This dataset can be used for machine learning-based methods due to its useful tagged information. Here, we explain the steps of preparing the database. It contains about 30 hours data from 40 speakers. Initial experiments are done utilizing hidden Markov models (HMM) as a visyllable classifier. Then, these models have been used for visual recognition of 6 Persian words with different numbers of syllables and an accuracy of 47.37% was obtained in a speaker-independent experiment.
لیست مقالات
لیست مقالات بایگانی شده
Performance analysis under the Independent Fluctuating Two-Ray (IFTR) Fading in RIS-Assisted Millimeter Wave Communications
Maryam Olyaee - Hadi Hashemi - Juan Manuel Romero Jerez
Designing Of Type-2 Fuzzy Formation Controller For A Class Of Nonlinear Multiagent System Using JAYA Algorithm
Arvin Attar - Mohammad Ali Badamchizadeh - Sehraneh Ghaemi
Design of a plasmonic MIM filter based on ring resonator incorporating circular air holes
Sara Gholinezhad Shafagh - Hassan Kaatuzian - Mohammad Danaie
Privacy-Preserving Learning using Autoencoder-based Structure
Mohammad Ali Jamshidi - Hadi Veisi - Mohammad Mahdi Mojahedian - Mohammad Reza Aref
Incentive-based Demand Response Economic Model for Peak Shaving Considering Load Serving Entity Profit Maximization
Nasim EslamiNia - Habib RajabiMashhdi
Optical Beam Switching using an Integrated Meta-Surface Device
Vahid Ghaffari - Leila Yousefi
Object Detection enhancement based on Super-Resolution Mapping
Danial Abyazi - Dadfar Abyazi - Mehran Yazdi
A Brief Review on DC-Link Control Strategies in Microgrids
Mehran Seydi - Hassan Moradi CheshmehBeigi - Mohammad Hossein Mousavi
طبقهبندی تصاویر سلولی پاپ اسمیر مبتنی بر الگوریتمهای ترتیبی یادگیری جمعی و شبکههای عمیق استخراج ویژگی
زهرا کمالی - محمدصادق هل فروش - کامران کاظمی - مژگان اکبرزاده
ساخت و مشخصه یابی حسگر گاز مونوکسیدکربن مبتنی بر هتروساختار p-n نیترید کربن گرافیتی متخلخل-اکسید مس
سمیرا جوانمردی - شیرین نصر اصفهانی - محمد حسین شیخی
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 42.0.4