Here we present synthesis samples generated by our proposed system and those generated by our implementation of the paper "Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning". However, we have made several changes to the original TTS model: 1. We have removed the residual encoder. 2. We have replaced the hybrid attention module with the forward attention with transition agent for higher stability. 3. The original is not appropriate for code-switching audio samples from the DB4 dataset. So we have made a simple adaptation that assigns the code-switching audio samples with a new language accent code '2', while '0' is for English and '1' is for Chinese. The model from our implementation of the paper "Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning" is notated with 'our implementation of CLMS/BLMS in [1]', and the following samples from that model are all synthesized with respect to the language of the text. In other words, '0' for English text, '1' for Chinese text, and '2' for code-switching text.
Original Voices
Data-sufficient
DB1
LJS
DB4-Mandarin
DB4-English
DB4-Code-switching
English Sentense Synthesis
Text
CLMS
BLMS
our implementation of CLMS in [1]
our implementation of BLMS in [1]
Their solution requires development of the human capacity for social interest.
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
His most significant scientific publications were studies of birds and animals.
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
They established royal commissions to recover illegally held church lands.
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
Unfortunately, others separate on the basis of accumulated hatred.
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
Mandarin Sentense Synthesis
Text
CLMS
BLMS
our implementation of CLMS in [1]
our implementation of BLMS in [1]
建筑设计师莱伊恩受命 设计了英国温泽市政府大厅
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
现在呢有很多朋友 都喜欢打游戏
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
堪忧本来就是令人担忧的意思
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
这个时候有两个身影 就向着火场逆行
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
Code-switching Sentense Synthesis
Text
CLMS
BLMS
our implementation of CLMS in [1]
our implementation of BLMS in [1]
用UC浏览器搜索 I believe I can fly
(Use the UC browser to search I believe I can fly)
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
我那个 company culture 其实 就是 everyone help each other
(Our company culture is that everyone help each other)
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
most of the time 就是去玩一下
(most of the time, we just hang around)
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
其实我很难判断 in my heart i think my chinese is better but people tell me that my english 是比较好
(Actually, it's hard for me to tell. In my heart, I think my Chinese is better, but people tell me that my English is better)
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
Reference
[1] Y. Zhang, R. J. Weiss, H. Zen, Y. Wu, Z. Chen, R. Skerry-Ryan, Y. Jia, A. Rosenberg, B. Ramabhadran, Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning, in: Proc. Interspeech 2019, pp. 2080–2084.