0% Complete
صفحه اصلی
/
سی و یکمین کنفرانس بین المللی مهندسی برق
Improving CycleGAN-VC2 Voice Conversion by Learning MCD-Based Evaluation and Optimization
نویسندگان :
Majid Behdad
1
Davood Gharavian
2
1- دانشگاه شهید بهشتی
2- دانشگاه شهید بهشتی
کلمات کلیدی :
CycleGAN-VC،perceptual evaluation،perceptual optimization،MetricGAN،Mel-Cepstral distance،speech quality Assessment،Nisqa tool
چکیده :
Abstract—Nowadays’ voice conversion systems that convert source speakers to target speakers in a speech utterance, have various applications, and improving their quality is very important. One method that still has not attracted enough attention for the VC quality improvements is to concentrate on the optimization of the discriminators of a GAN-based VC System. In this paper, we chose Cycle-GAN-VC2 as the baseline to implement a modified version of Mel-scale human hearing-related objective evaluation metric, Modified Mel-Cepstral Distance (MMCD) to help the discriminators to better learn to judge between real and fake data. We developed and implemented our new metric MMCD that is between 0 and 1 to use it in discriminators’ loss functions. The main goal is to force the discriminators to learn the MMCD metric behavior in its judgements; while in conventional CycleGAN-VC2, discriminators work as the classifiers that decide which data is real and which one is fake without any attention to perceptual references and measures like MCD score that can be varied continuously from zero to one. Experimental results show improvements in the quality of output speech versus MCD measure despite that the training of our baseline VC system is based on a set of non-parallel data, and don’t use any time-alignment in training process. So, in parallel VC systems more improvements could be anticipated.
لیست مقالات
لیست مقالات بایگانی شده
Security and Privacy Smart Contract Architecture for Energy Trading based on Blockchains
Masoumeh Nazari - Siavash Khorsandi - Jaber Babaki
Experimental Study of Pick and Place Operation for Packaging Using Delta Parallel Robot with Two-Fingered Gripper
Mona Mohades Mojtahedi - Arvin Mohammadi - Mehdi Tale Masouleh
Temporal Green's function of an RLC resonator with arbitrary time-varying capacitance using differential transition matrix
Somayeh Boshgazi - Khashayar Mehrany - Mohammad Memarian
A modified Dempster Shafer approach to classification in surgical skill assessment
Arash Iranfar - Mohammad Soleymannejad - Behzad Moshiri - Hamid D. Taghirad
Addressing Death from Heart Failure Using RACER Algorithm
Mohammad Mirsafaei - Alireza Basiri
A Wideband PLL with Programmable LC VCO with 5.1 to 7.9GHz Lock Range
Mohsen Azimikia - Arash Esmaili
Average Secrecy Capacity Performance Analysis for SWIPT-Based SIMO Underlay Cognitive Radio
Mohammad Javad Saber1 - Seyedeh Maryam Mazloum - Seyed Mohammad Sajad Sadough
Effect of structural connectivity weightings in graph-based analysis in Schizophrenia
Sara Khamseh - Farzaneh Keyvanfard
Smart EV Charging in Residential Power Grids Considering Users’ Preferences
Mahya Shahshahani - Ali Moradi Amani - Mahdi Jalili
Angular Stable Multiband Miniaturized Flexible Frequency Selective Surface
Mozhgun Moazzamnia - Javad Nourinia - Changiz Ghobadi - Keyhan Hosseini - Mohsen Karamirad - Baman Mohammadi
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 40.4.2