GenAI
NLP
InterSpeech

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

May 22, 2023

We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for the downstream text-to-speech (TTS) task. Our XPhoneBERT has the same model architecture as BERT-base, trained using the RoBERTa pre-training approach on 330M phoneme-level sentences from nearly 100 languages and locales. Experimental results show that employing XPhoneBERT as an input phoneme encoder significantly boosts the performance of a strong neural TTS model in terms of naturalness and prosody and also helps produce fairly high-quality speech with limited training data. We publicly release our pre-trained XPhoneBERT with the hope that it would facilitate future research and downstream TTS applications for multiple languages.

Overall

< 1 minute

Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen

InterSpeech 2023

Share Article

Related publications

GenAI
CV
NeurIPS
November 28, 2024

Hao Phung*, Quan Dao*, Trung Dao, Viet Hoang Phan, Dimitris N. Metaxas, Anh Tran

GenAI
ML
NeurIPS
November 28, 2024
Long Tung Vuong, Anh Tuan Bui,
Khanh Doan, Trung Le, Paul Montague, Tamas Abraham, Dinh Phung
GenAI
ML
NeurIPS
November 28, 2024

Minh Le, An Nguyen, Huy Nguyen, Trang Nguyen, Trang Pham, Linh Van Ngo, Nhat Ho

GenAI
NLP
EMNLP
November 28, 2024

Quyen Tran*, Nguyen Xuan Thanh*, Nguyen Hoang Anh*, Nam Le Hai, Trung Le, Linh Van Ngo, Thien Huu Nguyen