Introduction


Here we present synthesis samples generated by our proposed system and those generated by our implementation of the paper "Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning". However, we have made several changes to the original TTS model: 1. We have removed the residual encoder. 2. We have replaced the hybrid attention module with the forward attention with transition agent for higher stability. 3. The original is not appropriate for code-switching audio samples from the DB4 dataset. So we have made a simple adaptation that assigns the code-switching audio samples with a new language accent code '2', while '0' is for English and '1' is for Chinese. The model from our implementation of the paper "Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning" is notated with 'our implementation of CLMS/BLMS in [1]', and the following samples from that model are all synthesized with respect to the language of the text. In other words, '0' for English text, '1' for Chinese text, and '2' for code-switching text.

Original Voices


Data-sufficient
DB1
LJS
DB4-Mandarin
DB4-English
DB4-Code-switching

English Sentense Synthesis


Text CLMS BLMS our implementation of CLMS in [1] our implementation of BLMS in [1]
Their solution requires development
of the human capacity for social interest.
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
His most significant scientific publications
were studies of birds and animals.
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
They established royal commissions to
recover illegally held church lands.
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
Unfortunately, others separate
on the basis of accumulated hatred.
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1

Mandarin Sentense Synthesis


Text CLMS BLMS our implementation of CLMS in [1] our implementation of BLMS in [1]
建筑设计师莱伊恩受命
设计了英国温泽市政府大厅
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
现在呢有很多朋友
都喜欢打游戏
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
堪忧本来就是令人担忧的意思
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
这个时候有两个身影
就向着火场逆行
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1

Code-switching Sentense Synthesis


Text CLMS BLMS our implementation of CLMS in [1] our implementation of BLMS in [1]
用UC浏览器搜索
I believe I can fly
(Use the UC browser to search
I believe I can fly)
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
我那个 company culture 其实
就是 everyone help each other
(Our company culture is that
everyone help each other)
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
most of the time
就是去玩一下
(most of the time,
we just hang around)
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
其实我很难判断 in my heart
i think my chinese is better but people
tell me that my english 是比较好
(Actually, it's hard for me to tell. In my heart,
I think my Chinese is better, but people
tell me that my English is better)
DB4
LJS
DB1
LJS
DB1
DB4
LJS
DB1
LJS
DB1
Reference

[1] Y. Zhang, R. J. Weiss, H. Zen, Y. Wu, Z. Chen, R. Skerry-Ryan, Y. Jia, A. Rosenberg, B. Ramabhadran, Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning, in: Proc. Interspeech 2019, pp. 2080–2084.