0% Complete
صفحه اصلی
/
بیست و نهمین کنفرانس مهندسی برق ایران
PAVID-CVs: Persian Audio-Visual Database of CV syllables
نویسندگان :
Mahsa Hedayatipour
1
Yasser Shekofteh
2
Mohsen Ebrahimi Moghaddam
3
1- دانشگاه شهید بهشتی
2- دانشگاه شهید بهشتی
3- دانشگاه شهید بهشتی
کلمات کلیدی :
Visual Speech Recognition, Lip Reading, CV syllables, Visyllable, Audio-Visual Database, Persian/Farsi Language.
چکیده :
Abstract— Lip-reading is a visual speech recognition process. In this process, recognizing the smaller units of speech can be the basis for recognizing the larger units of a language such as words. In this paper, we have introduced a Persian (Farsi) Audio-Visual Database of CV syllables, named PAVID-CVs, as a set of isolated two-phoneme visyllable and isolated words related to the visyllables, which include only Persian CV syllables, for lip-reading or audio-visual speech recognition purposes such as isolated word recognition. This dataset can be used for machine learning-based methods due to its useful tagged information. Here, we explain the steps of preparing the database. It contains about 30 hours data from 40 speakers. Initial experiments are done utilizing hidden Markov models (HMM) as a visyllable classifier. Then, these models have been used for visual recognition of 6 Persian words with different numbers of syllables and an accuracy of 47.37% was obtained in a speaker-independent experiment.
لیست مقالات
لیست مقالات بایگانی شده
طراحی و ساخت سیستم تصویربرداری SAR دایروی موج میلیمتری
علی آقاکثیری - امیرعلی بنایی کاشانی - علی تاجیک - علیرضا کیایی - هنگامه عزیزی - مهدی عندلیبی - سامان غضنفری - محمد فخارزاده
Design and Analysis of a Low-Power Two-Stage Dynamic Comparator with 40ps Delay in 65nm CMOS Technology
Razieh Ghasemi - Hossein Ghasemian - Ebrahim Abiri - Mohammad Reza Salehi
Noninvasive Blood Pressure Classification Based on Photoplethysmography Using Machine Learning Techniques
Hanieh Mohammadi - Bahram Tarvirdizadeh - Khalil Alipour - Mohammad Ghamari
طراحی تزویجگر پهن باند سه استابی فشرده میکرواستریپ برای استفاده در ترکیب کننده توان
صادق حیدری کاهکش - اکرم شیخی
Denoising of the Diffusion Tensor Imaging Data Using k-Space Redundancy
Khashayar Esmaeilzadeh - Farzaneh Keyvanfard - Abbas Nasiraei Moghaddam
Empirical Performance Analysis and Channel Modeling of UAV-Assisted LoRa Networks
Hossein Aghajari - Sajad Ahmadinabi - Hamed Bakhtiari babadegani - Mehdi Naderi soorki
Fault tolerant control design for linear systems based on cubic observers
Mahsa Hasanshahi - Malihe Maghfoori Farsangi - Elham Amini Boroujeni
Underwater Image Quality Assessment via Color and Contrast Analysis
Meysam Ghalyani - Maryam Karimi
ارائه مبدل DC-DC غیر ایزوله هیبریدی بهره ولتاژ بالا با سوئیچ فعال سلفی
حسن زارعین - مجتبی حیدری - سیدمحمد دهقان دهنوی
Design and Practical Implementation of Internal Model Controller for Temperature Regulation of Thermoelectric Cell
Parastoo Kamali - Sanaz Iman Shayan - Mahshid Mousapour - Fatemeh Abdolsamadi - Salar Zeinali - Sadra Rafatnia
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 42.0.4