0% Complete
صفحه اصلی
/
سی و یکمین کنفرانس بین المللی مهندسی برق
Improving CycleGAN-VC2 Voice Conversion by Learning MCD-Based Evaluation and Optimization
نویسندگان :
Majid Behdad
1
Davood Gharavian
2
1- دانشگاه شهید بهشتی
2- دانشگاه شهید بهشتی
کلمات کلیدی :
CycleGAN-VC،perceptual evaluation،perceptual optimization،MetricGAN،Mel-Cepstral distance،speech quality Assessment،Nisqa tool
چکیده :
Abstract—Nowadays’ voice conversion systems that convert source speakers to target speakers in a speech utterance, have various applications, and improving their quality is very important. One method that still has not attracted enough attention for the VC quality improvements is to concentrate on the optimization of the discriminators of a GAN-based VC System. In this paper, we chose Cycle-GAN-VC2 as the baseline to implement a modified version of Mel-scale human hearing-related objective evaluation metric, Modified Mel-Cepstral Distance (MMCD) to help the discriminators to better learn to judge between real and fake data. We developed and implemented our new metric MMCD that is between 0 and 1 to use it in discriminators’ loss functions. The main goal is to force the discriminators to learn the MMCD metric behavior in its judgements; while in conventional CycleGAN-VC2, discriminators work as the classifiers that decide which data is real and which one is fake without any attention to perceptual references and measures like MCD score that can be varied continuously from zero to one. Experimental results show improvements in the quality of output speech versus MCD measure despite that the training of our baseline VC system is based on a set of non-parallel data, and don’t use any time-alignment in training process. So, in parallel VC systems more improvements could be anticipated.
لیست مقالات
لیست مقالات بایگانی شده
Low VHF Wire Antenna with Low-cost and Wideband Properties
Mahdieh Bozorgi - Mahmood Rafaei-booket - Sina Hasibi-Taheri
Optimization of 915nm laser diode asymmetric structure: experimental and theoretical studies in tandem
Seyed peyman Abbasi - Maryam Lajvardi - Arash Hodaei
Implementation of an Optimized Deep Learning Model to Assess Pediatric Sleep Apnea Severity Using SpO2 Signals on Resource-Limited Microcontrollers
Erfan Mortazavi - Hanieh Mohammadi - Bahram Tarvirdizadeh - Khalil Alipour - Mohammad Ghamari
Event Related Potentials Extraction using Low-rank Tensor Decomposition
Zahra SohrabiBonab - Mohammad Bagher Shamsollahi
سیستم تشخیص فعالیت مبتنی بر مدلسازی تصویری تنک اطلاعات حالت کانال و شبکه عصبی کانولوشنی
علیرضا ابوالقاسمی - سید محمد تقی المدرسی - سید مجتبی آقایی
Design of Semi-transparent Perovskite Solar Cells with Antireflection Coatings
Kosar Sattarnasery - Mohammad Razaghi - Keyhan Hosseini - Mahsa Moradbeigi
Implementation of a 14-Channel Real-time Compact Data Logger for Structure and Mechanical Engineering Laboratories
Keivan Sadeghinezhad - Esmaeil Najafiaghdam - Sara Dezhakam - Ali Sadeghinezhad
PCG Denoising using AR-based Kalman Filter
Mohammad Sadegh Nazemi - Hesam Hakimnejad - Zohreh Azimifar
Optimal Sizing and Placing of Capacitors in Distribution Networks in the Presence of Three-Phase Induction Motors Using Genetic Algorithm
Seyed Amir Hossein Mohamadi - Seyed Amir Mohammad Lahaghi - Shayan Nazari - Behrooz Zaker
A Comprehensive Analysis Method to Improve the Operation of Transmission Networks from the Perspective of Resonance and Ferroresonance phenomena
MohamadAli Amini - Mehdi SALAY NADERI - Ali Asghar Farrokhi Raad - Gevork B. Gharehpetian
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 42.5.3