0% Complete
صفحه اصلی
/
سی و یکمین کنفرانس بین المللی مهندسی برق
Improving CycleGAN-VC2 Voice Conversion by Learning MCD-Based Evaluation and Optimization
نویسندگان :
Majid Behdad
1
Davood Gharavian
2
1- دانشگاه شهید بهشتی
2- دانشگاه شهید بهشتی
کلمات کلیدی :
CycleGAN-VC،perceptual evaluation،perceptual optimization،MetricGAN،Mel-Cepstral distance،speech quality Assessment،Nisqa tool
چکیده :
Abstract—Nowadays’ voice conversion systems that convert source speakers to target speakers in a speech utterance, have various applications, and improving their quality is very important. One method that still has not attracted enough attention for the VC quality improvements is to concentrate on the optimization of the discriminators of a GAN-based VC System. In this paper, we chose Cycle-GAN-VC2 as the baseline to implement a modified version of Mel-scale human hearing-related objective evaluation metric, Modified Mel-Cepstral Distance (MMCD) to help the discriminators to better learn to judge between real and fake data. We developed and implemented our new metric MMCD that is between 0 and 1 to use it in discriminators’ loss functions. The main goal is to force the discriminators to learn the MMCD metric behavior in its judgements; while in conventional CycleGAN-VC2, discriminators work as the classifiers that decide which data is real and which one is fake without any attention to perceptual references and measures like MCD score that can be varied continuously from zero to one. Experimental results show improvements in the quality of output speech versus MCD measure despite that the training of our baseline VC system is based on a set of non-parallel data, and don’t use any time-alignment in training process. So, in parallel VC systems more improvements could be anticipated.
لیست مقالات
لیست مقالات بایگانی شده
یک روش اقتصادی برای تعیین مکان بهینه ریکلوزرها در فیدرهای توزیع شعاعی با هدف بهبود قابلیت اطمینان
محمودرضا شاکرمی - میثم دوستی زاده - هومن بسطامی - مهران امیری - ابراهیم شریفی پور - شمس الدین کمالوند
Adaptive fault tolerant neural control of heterogeneous second-order multi-agent systems
Mohammad Hadi Rezaei - Ali Abooee
Efficient and Fast Analysis of SIW Microwave Devices Using the Multiple Multipole Technique
Ahmad Bakhtafrouz - Mohammad Moemenian - Mohsen Maddahali - Mohsen Karimian Kakolaki
Multi-objective Optimization of Peer-to-Peer Transactions in Arizona State University’s Microgrid by NSGA II
Pourya Shirinshahrakfard - Amir Abolfazl Suratgar - Mohammad Bagher Menhaj - Gevork B. Gharehpetian
ساخت حسگر گاز بر پایه ی گرافن اکساید و سیلیکون متخلخل
سیده صفیه رضایی - مینا امیر مزلقانی
Numerical and Computational Study on Compressive Strain Effect in Perovskite Solar Cell
Daniyal Khosh Maram - Hamed Abnavi - Hanieh Talati Aghdam
Battery Sizing for energy management of islanded Microgrid considering the impact of discharge duration on Lead-Acid Battery effective capacity
Mehrdad Bagheri Sanjareh - Mohammad Hassan Nazari - Narges Sadat Ghiasi - Seyyed Mohammad Sadegh Ghiasi - Seyed Hoseein Hosseinian
Floquet model of spatiotemporally modulated graphene-based structures
Mahsa Valizadeh - Leila Yousefi - MirFaez Miri
Non-homogeneous interference suppression in OFDM array radars using direct data domain approach
Sima Shariatmadari
بهبودی بر مساله تشخیص اشیا برجسته درتصاویر مبتنی بر یادگیری عمیق
مهران طاهری - محمد صادق هل فروش - کامران کاظمی
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 40.3.1