0% Complete
صفحه اصلی
/
بیست و نهمین کنفرانس مهندسی برق ایران
PAVID-CVs: Persian Audio-Visual Database of CV syllables
نویسندگان :
Mahsa Hedayatipour
1
Yasser Shekofteh
2
Mohsen Ebrahimi Moghaddam
3
1- دانشگاه شهید بهشتی
2- دانشگاه شهید بهشتی
3- دانشگاه شهید بهشتی
کلمات کلیدی :
Visual Speech Recognition, Lip Reading, CV syllables, Visyllable, Audio-Visual Database, Persian/Farsi Language.
چکیده :
Abstract— Lip-reading is a visual speech recognition process. In this process, recognizing the smaller units of speech can be the basis for recognizing the larger units of a language such as words. In this paper, we have introduced a Persian (Farsi) Audio-Visual Database of CV syllables, named PAVID-CVs, as a set of isolated two-phoneme visyllable and isolated words related to the visyllables, which include only Persian CV syllables, for lip-reading or audio-visual speech recognition purposes such as isolated word recognition. This dataset can be used for machine learning-based methods due to its useful tagged information. Here, we explain the steps of preparing the database. It contains about 30 hours data from 40 speakers. Initial experiments are done utilizing hidden Markov models (HMM) as a visyllable classifier. Then, these models have been used for visual recognition of 6 Persian words with different numbers of syllables and an accuracy of 47.37% was obtained in a speaker-independent experiment.
لیست مقالات
لیست مقالات بایگانی شده
طراحی و ساخت تقویت کننده توان اصلاح شده مقاومتی-راکتیوی باند گسترده کلاس B/J با گین بالا در توان خروجی پشتی و شرایط بایاس کلاس AB
سارا آقاجانی - محمود کمره ای - مرضیه چگینی
Improved Equivalent Input Disturbance Control of Nonlinear Aeropendulum System Using Data-Driven Approach
Mohammad Hossein Bayati - Arman Marzban - Mahsan Tavakoli-Kakhki - Ali Naseh
A Comprehensive Analysis Method to Improve the Operation of Transmission Networks from the Perspective of Resonance and Ferroresonance phenomena
MohamadAli Amini - Mehdi SALAY NADERI - Ali Asghar Farrokhi Raad - Gevork B. Gharehpetian
Lane Change Decision Making Using Deep Reinforcement Learning
Pedram Lamei - Mohammad Haeri
Design of a Retinal Prosthesis Circuit With In-pixel Edge Detection Capability
Zahra Bonesbordi - Sayed Masoud Sayedi
Designing of Multilayer Planar Spiral Air-Core Inductor for Power Electronic Applications
Mohammad Khakroei - Mohsen Mostafaei - Mansour Arefian - Afshin Rezaei-Zare - Majid Najafi Zarmehri
Fully Soft-Switched Quadratic High Step-Up DC-DC Converter with a Single Switch and Low Input Current Ripple for Renewable Energy Applications
Ali Nadermohammadi - Hamed Abdi - Pouya Abolhassani - Seyed Hossein Hosseini - Mehran Sabahi - Naghi Rostami
Dynamic Wide Area Situational Awareness: Practical Experience
Maghsoud Mokhtari - Mostafa Rajabi Mashhadi - Mehdi Moghimzadeh - Maziyar Jamshidi - Mehdi Baligh
Energy-Efficient Residue-to-Binary Conversion Based on a Modulo-Adder-Free Architecture
Kamalaldin Mozaffari Maid - Amir Sabbagh Molahosseini
Large Scale Indoor VLC Positioning Using Image Sensor with Limited Field of View
Arezoo Kabiri - Foroogh Sadat Tabataba
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 42.3.2