کنفرانس مهندسی برق ایران

صفحه اصلی / بیست و نهمین کنفرانس مهندسی برق ایران

PAVID-CVs: Persian Audio-Visual Database of CV syllables

نویسندگان :

Mahsa Hedayatipour¹ Yasser Shekofteh² Mohsen Ebrahimi Moghaddam³

1- دانشگاه شهید بهشتی 2- دانشگاه شهید بهشتی 3- دانشگاه شهید بهشتی

کلمات کلیدی :

Visual Speech Recognition, Lip Reading, CV syllables, Visyllable, Audio-Visual Database, Persian/Farsi Language.

چکیده :

Abstract— Lip-reading is a visual speech recognition process. In this process, recognizing the smaller units of speech can be the basis for recognizing the larger units of a language such as words. In this paper, we have introduced a Persian (Farsi) Audio-Visual Database of CV syllables, named PAVID-CVs, as a set of isolated two-phoneme visyllable and isolated words related to the visyllables, which include only Persian CV syllables, for lip-reading or audio-visual speech recognition purposes such as isolated word recognition. This dataset can be used for machine learning-based methods due to its useful tagged information. Here, we explain the steps of preparing the database. It contains about 30 hours data from 40 speakers. Initial experiments are done utilizing hidden Markov models (HMM) as a visyllable classifier. Then, these models have been used for visual recognition of 6 Persian words with different numbers of syllables and an accuracy of 47.37% was obtained in a speaker-independent experiment.