NLP InterSpeech

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

May 22, 2023

We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for the downstream text-to-speech (TTS) task. Our XPhoneBERT has the same model architecture as BERT-base, trained using the RoBERTa pre-training approach on 330M phoneme-level sentences from nearly 100 languages and locales. Experimental results show that employing XPhoneBERT as an input phoneme encoder significantly boosts the performance of a strong neural TTS model in terms of naturalness and prosody and also helps produce fairly high-quality speech with limited training data. We publicly release our pre-trained XPhoneBERT with the hope that it would facilitate future research and downstream TTS applications for multiple languages.

Overall

< 1 minute

Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen

InterSpeech 2023

Share Article

Related publications

NLP NAACL Top Tier
April 4, 2024

*Thanh-Thien Le, *Viet Dao, *Linh Van Nguyen, Nhung Nguyen, Linh Ngo Van, Thien Huu Nguyen

GA-LLM NLP NAACL Top Tier
April 4, 2024

Thang Le, Tuan Luu

NLP EMNLP Findings
January 26, 2024

Thang Le, Luu Anh Tuan