0% Complete
صفحه اصلی
/
بیست و نهمین کنفرانس مهندسی برق ایران
PAVID-CVs: Persian Audio-Visual Database of CV syllables
نویسندگان :
Mahsa Hedayatipour
1
Yasser Shekofteh
2
Mohsen Ebrahimi Moghaddam
3
1- دانشگاه شهید بهشتی
2- دانشگاه شهید بهشتی
3- دانشگاه شهید بهشتی
کلمات کلیدی :
Visual Speech Recognition, Lip Reading, CV syllables, Visyllable, Audio-Visual Database, Persian/Farsi Language.
چکیده :
Abstract— Lip-reading is a visual speech recognition process. In this process, recognizing the smaller units of speech can be the basis for recognizing the larger units of a language such as words. In this paper, we have introduced a Persian (Farsi) Audio-Visual Database of CV syllables, named PAVID-CVs, as a set of isolated two-phoneme visyllable and isolated words related to the visyllables, which include only Persian CV syllables, for lip-reading or audio-visual speech recognition purposes such as isolated word recognition. This dataset can be used for machine learning-based methods due to its useful tagged information. Here, we explain the steps of preparing the database. It contains about 30 hours data from 40 speakers. Initial experiments are done utilizing hidden Markov models (HMM) as a visyllable classifier. Then, these models have been used for visual recognition of 6 Persian words with different numbers of syllables and an accuracy of 47.37% was obtained in a speaker-independent experiment.
لیست مقالات
لیست مقالات بایگانی شده
Electrical Properties of Dielectric Barrier Discharge Plasma Actuator In Argon With 13.56MHz RF Power Supply
Sepideh Bashiry - Nayyereh Zahednia - Mehdi Bakhshzad Mahmoudi
Active and Passive Beamforming for Secure Wireless Communication via Star-RIS under imperfect CSI
Seyedeh Reyhane Shahcheragh - Kamal Mohamed-pour
Dual-Branch Cross-Parallel Transformer Model for Single-Channel Speech Enhancement
Mohammad Hakimkhah - Rahil Mahdian Toroghi - Hassan Zareian
LoRa-based Intelligent Helmet for Coal Miner Safety: Neural Network Prediction and BLE Location Tracking
Saba Pirahmadian - Sorin Yousefnia - Soheil Ganjefar
Double-Input/Double-Output Buck-Zeta Converter
Mahdi Ghavaminejad - Ebrahim Afjei - Masoud Meghdadi
طراحی تقویت کننده توان موج میلی متری پهن باند در فناوری سی ماس برای کاربردهای نسل پنجم
سید محمد مهدی جعفری - صمد شیخایی
LPV Controller Design for Trajectory Tracking of Nonholonomic Wheeled Mobile Robots in the Presence of Slip
Mohammad Sabouri - Mohammad Hassan Asemani
Improving Artificial Neural Network Performance Using Hybrid Activation Function
Morteza Taheri - Sajad Haghzad Klidbary
پایدارسازی سیستم های چند ورودی-چند خروجی n-بعدی با استفاده از تساوی بزوت، تئوری متباین و نمایش کسری-ماتریسی سیستم
سعید پورمیرزایی - وحید صفری دهنوی - مسعود شفیعی
Autonomous, Bio-inspired vision-based navigation system for indoor flying using hybrid optical flow and stereopsis methods
Masoud Mohtadifar - Hadi Seyedarabi
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 40.4.2