0% Complete
صفحه اصلی
/
بیست و نهمین کنفرانس مهندسی برق ایران
PAVID-CVs: Persian Audio-Visual Database of CV syllables
نویسندگان :
Mahsa Hedayatipour
1
Yasser Shekofteh
2
Mohsen Ebrahimi Moghaddam
3
1- دانشگاه شهید بهشتی
2- دانشگاه شهید بهشتی
3- دانشگاه شهید بهشتی
کلمات کلیدی :
Visual Speech Recognition, Lip Reading, CV syllables, Visyllable, Audio-Visual Database, Persian/Farsi Language.
چکیده :
Abstract— Lip-reading is a visual speech recognition process. In this process, recognizing the smaller units of speech can be the basis for recognizing the larger units of a language such as words. In this paper, we have introduced a Persian (Farsi) Audio-Visual Database of CV syllables, named PAVID-CVs, as a set of isolated two-phoneme visyllable and isolated words related to the visyllables, which include only Persian CV syllables, for lip-reading or audio-visual speech recognition purposes such as isolated word recognition. This dataset can be used for machine learning-based methods due to its useful tagged information. Here, we explain the steps of preparing the database. It contains about 30 hours data from 40 speakers. Initial experiments are done utilizing hidden Markov models (HMM) as a visyllable classifier. Then, these models have been used for visual recognition of 6 Persian words with different numbers of syllables and an accuracy of 47.37% was obtained in a speaker-independent experiment.
لیست مقالات
لیست مقالات بایگانی شده
Developing Low Profile Carpet Cloaks using ENZ slabs
Amin Monemian Esfahani - Leila Yousefi
RCS Calculation of a Symmetrical Microstrip Array Using Discrete Bodies of Revolution Method
Hossein Mohammadzadeh - Abolghasem Zeidaabadi Nezhad - Zaker Hossein Firouzeh
STAR-RIS Secrecy Rate Analysis in the Presence of Energy Harvesting Eavesdroppers
Mohammad Reza Kavianinia - Mohammad Javad Emadi
Design of Dual-Band Triangular Microstrip Antenna Using Fractal Structure for Wi-Max and Wi-Fi Applications
Arian Mianji - Mohammad Bemani - Saeid Nikmehr - Ahmad Atashpaz Gargari
A Time-Distributed Convolutional Long Short-Term Memory for Hand Gesture Recognition
Mehdi Fatan Serj - Mersad Asgari - Bahram Lavi - Domenec Puig Valls - Miguel Angel Garcia
Designing Music Recommendation System based on music Genre by using Bi-LSTM
Saman Mesghali - Javad Askari
Dynamic Lane Changing Control of Vehicle Platoon
Abolfazl Saadati Moghadam - Mohammad Haeri
Investigation of Li3P as Electrolyte and Lithium-ion conductor: An Ab-Initio Study
Keyvan Khosh Abady - ََamin Niksirat - Negar Karpourazar - Mahdi Pourfath
A Wideband White and Colored Noise Generator as an Environmental Communication Systems Controller
Somayeh Mehraban - Nasser Masoumi
Low-Power Fano Resonance-Based MIM Plasmonic Switch Using Kerr-Type Nonlinear Material
Yousef Karimi - Hassan Kaatuzian
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 41.7.4