0% Complete
صفحه اصلی
/
سی و یکمین کنفرانس بین المللی مهندسی برق
Improving CycleGAN-VC2 Voice Conversion by Learning MCD-Based Evaluation and Optimization
نویسندگان :
Majid Behdad
1
Davood Gharavian
2
1- دانشگاه شهید بهشتی
2- دانشگاه شهید بهشتی
کلمات کلیدی :
CycleGAN-VC،perceptual evaluation،perceptual optimization،MetricGAN،Mel-Cepstral distance،speech quality Assessment،Nisqa tool
چکیده :
Abstract—Nowadays’ voice conversion systems that convert source speakers to target speakers in a speech utterance, have various applications, and improving their quality is very important. One method that still has not attracted enough attention for the VC quality improvements is to concentrate on the optimization of the discriminators of a GAN-based VC System. In this paper, we chose Cycle-GAN-VC2 as the baseline to implement a modified version of Mel-scale human hearing-related objective evaluation metric, Modified Mel-Cepstral Distance (MMCD) to help the discriminators to better learn to judge between real and fake data. We developed and implemented our new metric MMCD that is between 0 and 1 to use it in discriminators’ loss functions. The main goal is to force the discriminators to learn the MMCD metric behavior in its judgements; while in conventional CycleGAN-VC2, discriminators work as the classifiers that decide which data is real and which one is fake without any attention to perceptual references and measures like MCD score that can be varied continuously from zero to one. Experimental results show improvements in the quality of output speech versus MCD measure despite that the training of our baseline VC system is based on a set of non-parallel data, and don’t use any time-alignment in training process. So, in parallel VC systems more improvements could be anticipated.
لیست مقالات
لیست مقالات بایگانی شده
Efficiency Enhancement of Heterojunction IBC Solar Cell: Surface Passivation
Amirmohammad Shahryari - Zohreh Golshan bafghi - Negin Manavizadeh
Service Restoration in Distribution Networks Based on a Two-stage Power Flow Model
Saman Armand - Jalal Heidary - Eli Shirazi
Peer-to-peer Energy Sharing Considering Prosumers' Preferences and Load Uncertainties
Mohammad Bagher Moradi - Mohammad Hasan Nazari - Seyed Hossein Hosseinian - Hamed Nafisi
Online Estimation of Power System Inertia Using Electromechanical Oscillation Parameters with High Penetration of Renewables
Shwan Sheikhahmadi - Ali Hesami Naghshbandy - Ayda Faraji
بررسی تاثیر دینامیکی سیستمهای انرژی خورشیدی متصل به شبکه بر بارگذاری ترانسفورماتور و بهبود عملکرد شبکه فشار ضعیف توزیع نیروی برق
مهدی محمدی - رضا خدادی - علی معصومی
A New Low Noise 4-Gb/s Serial CMOS MPPM Modulator
Erfan Alasvand Andekah - Noushin Ghaderi - Mostafa Pour Sayahi
طراحی و شبیهسازی یک آرایه انعکاسی پهن باند به کمک روش چرخش قطبش موج بازتابی و سنتز فاز چند فرکانسی روزنه آنتن
مجید کریمی پور - ایمان آریانیان
Diagnosis of Heart Diseases based on Processing Heart Sound using Machine Learning
Maryam Moulaverdi - Akbar Ranjbar
Thermo-optically Adjustment of Stimulated Brillouin Scattering in Integrated Slot Ring Resonators
Mahdi Piri - Bijan Abbasi Arand - Sayyed Reza Mirnaziry
Low-Leakage 6T SRAM Cell for In-Memory Computing with High Stability
Deniz Najafi - Behzad Ebrahimi
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 40.4.2