0% Complete
صفحه اصلی
/
سی و یکمین کنفرانس بین المللی مهندسی برق
Improving CycleGAN-VC2 Voice Conversion by Learning MCD-Based Evaluation and Optimization
نویسندگان :
Majid Behdad
1
Davood Gharavian
2
1- دانشگاه شهید بهشتی
2- دانشگاه شهید بهشتی
کلمات کلیدی :
CycleGAN-VC،perceptual evaluation،perceptual optimization،MetricGAN،Mel-Cepstral distance،speech quality Assessment،Nisqa tool
چکیده :
Abstract—Nowadays’ voice conversion systems that convert source speakers to target speakers in a speech utterance, have various applications, and improving their quality is very important. One method that still has not attracted enough attention for the VC quality improvements is to concentrate on the optimization of the discriminators of a GAN-based VC System. In this paper, we chose Cycle-GAN-VC2 as the baseline to implement a modified version of Mel-scale human hearing-related objective evaluation metric, Modified Mel-Cepstral Distance (MMCD) to help the discriminators to better learn to judge between real and fake data. We developed and implemented our new metric MMCD that is between 0 and 1 to use it in discriminators’ loss functions. The main goal is to force the discriminators to learn the MMCD metric behavior in its judgements; while in conventional CycleGAN-VC2, discriminators work as the classifiers that decide which data is real and which one is fake without any attention to perceptual references and measures like MCD score that can be varied continuously from zero to one. Experimental results show improvements in the quality of output speech versus MCD measure despite that the training of our baseline VC system is based on a set of non-parallel data, and don’t use any time-alignment in training process. So, in parallel VC systems more improvements could be anticipated.
لیست مقالات
لیست مقالات بایگانی شده
Fuzzy Fractional Order Sliding Mode Controller Design for a Wind Turbine with DFIG
Mohammad Hossein Aghaseyedabdollah - Yasin Alavian - Hadi Azmi - Alireza Yazdizadeh
Underwater Image Quality Assessment via Color and Contrast Analysis
Meysam Ghalyani - Maryam Karimi
طراحی و ساخت سیستم تصویربرداری SAR دایروی موج میلیمتری
علی آقاکثیری - امیرعلی بنایی کاشانی - علی تاجیک - علیرضا کیایی - هنگامه عزیزی - مهدی عندلیبی - سامان غضنفری - محمد فخارزاده
بررسی کنترل مغناطیسی پاسخ کایرواپتیکی ساختارهای مگنتوکایرال
کی سیاوش کیکاوسی - حمیده دشتی خویدکی - جواد احمدی شکوه - مجید رشیدی هویه
Adaptive Control of Switched Nonlinear Systems with Unknown Control Directions
ٍElham Ovaysi - Marzieh Kamali
تخمین افسردگی مبتنی بر صوت با استفاده از بانک فیلتر و شبکه عصبی ResNet
علی نیک خراسانی - محمدرضا اکبرزاده توتونچی - مجید غیورمبرهن
Single-Channel Recursive Speech Separation with Unknown Speaker Count by Mask Estimation
Hadi Alizadeh - Rahil Mahdian Toroghi - Hassan Zareian
On the Correction of the Boundary Deformation Errors in Microwave Imaging With Spatial Priors
Seyyed Mohammad Hosseini - Amir Ahmad Shishegar
Terahertz transceiver front-end based on spatiotemporally modulated graphene-based structures
Mahsa Valizadeh - Leila Yousefi - MirFaez Miri
Enhanced the Droop Approach MMC-Based in AC Microgrids
Amirhossein Fallah Bagheri - Hamid Reza Baghaee - Ali Yazdian Varjani - Kourosh Khalaj Monfared - Reza Alizadeh
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 42.0.4