Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder論文翻譯（2）

05-09

論文地址

D. Objective Evaluations

客觀評價

We visualize mean Mel-cepstral distortion (MCD) valueson the evaluation set in Fig. 2. Our proposed methods trainedon unaligned data performed on par with the baselines whichutilized aligned frames. The results might imply that allthe systems achieved comparable level of performance. AsMCD was not a representative indicator for perception, wefurther conducted subjective evaluations on voice quality andsimilarity.

我們將測試集上的平均梅爾倒譜失真（MCD）值在圖表2中可視化。我們提出的方法在未對齊數據上訓練，與使用對齊幀的基線比較。結果可能意味著所有的系統都達到了可比較的性能水平。由於MCD不是知覺的代表性指標，我們進一步對語音質量和相似度進行了主觀評估。

E. Subjective Evaluations

主觀評價

As for subjective evaluation, we chose ENMF-3000 as thebaseline because it offered higher quality of synthetic voicethan ENMF-512. We evaluated our proposed method (VAEpair)by listening tests. Ten listeners were invited to evaluatethe results. We divided our experiments into inter- and intragenderconversion. Every listener was asked to evaluate a meanopinion score (MOS) on voice quality and ABX tests on voicequality and target similarity. The results are shown in Fig. 3.

至於主觀評估，我們選擇ENMF-3000作為基準線，因為它提供了比ENMF-512更高的合成語音質量。我們通過聽力測試評估了我們提出的方法（VAEpair）。邀請十位聽眾評估結果。

我們將實驗分為性別間和性別內轉換。每位聽眾被要求評估語音質量的平均意見分數（MOS）和語音質量和目標相似性的ABX測試。結果如圖3所示。

The ABX test on target similarity revealed that both systemsperformed at a comparable level. This result was anticipated,and was consistent with the MCD objective evaluation. As forvoice quality, our proposed method also achieved similar levelas the ENMF-3000 baseline. VAE-pair achieved 2.76 MOS(with standard deviation 0.44) while ENMF-3000 achieved2.75 MOS (with standard deviation 0.50). This result wasrather encouraging since we initially conjectured that theperformance degradation would be somewhat higher becauseVAE-pair used unaligned training data. Note that the voicequality of ENMF-3000 was rather acceptable (unlike that ofENMF-512, which was at the brink of satisfaction). Moresubjective evaluations on VAE-multi and VAE-disj will beconducted in our future work.

針對目標相似性的ABX測試顯示，兩個系統的運行水平相當。這個結果是預期的，並且與MCD客觀評估一致。至於語音質量，我們提出的方法也達到了與ENMF-3000基線類似的水平。 VAE對達到2.76 MOS（標準偏差0.44），而ENMF-3000達到2.75 MOS（標準偏差0.50）。由於我們最初認為性能下降會稍微高一些，因此這一結果相當鼓舞人心因為VAE對使用未對齊的訓練數據。請注意，ENMF-3000的語音質量相當可接受（不像那些滿意的ENMF-512）。我們將在未來的工作中對VAE-multi和VAE-disj進行更多的主觀評估。

歡迎關注公眾號：huangxiaobai880

https://www.zhihu.com/video/952485193831976960
推薦閱讀：

※ZZ22翻譯: 虞美人 (李煜)
※《古文觀止-答蘇武書》的譯文和注釋是什麼？
※《刺客列傳》的翻譯是什麼？
※吉訶德重磅乾貨帖：翻譯、口譯、交傳、同傳關係、如何練習同傳及同傳設備認識
※DF245翻譯對比：野望（杜甫)

TAG:翻譯 |