XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

May 22, 2023

We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for the downstream text-to-speech (TTS) task. Our XPhoneBERT has the same model architecture as BERT-base, trained using the RoBERTa pre-training approach on 330M phoneme-level sentences from nearly 100 languages and locales. Experimental results show that employing XPhoneBERT as an input phoneme encoder significantly boosts the performance of a strong neural TTS model in terms of naturalness and prosody and also helps produce fairly high-quality speech with limited training data. We publicly release our pre-trained XPhoneBERT with the hope that it would facilitate future research and downstream TTS applications for multiple languages.

Back to research

Overall

< 1 minute

Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen

InterSpeech 2023

Download PDF

Download Code

Related publications

GenAI

NLP

LREC-COLING

Improving Vietnamese-English Medical Machine Translation

June 28, 2024

Nhu Vo, Dat Quoc Nguyen, Dung D. Le, Massimo Piccardi, Wray Buntine

GenAI

NLP

Findings of ACL

Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs

June 28, 2024

Minh-Vuong Nguyen, Linhao Luo, Fatemeh Shiri, Dinh Phung, Yuan-Fang Li, Thuy-Trang Vu, Gholamreza Haffari

GenAI

NLP

Findings of ACL

Realistic Evaluation of Toxicity in Large Language Models

June 28, 2024

Tinh Son Luong, Thanh-Thien Le, Linh Van Ngo, and Thien Huu Nguyen

GenAI

NLP

ACL Top Tier

UniBridge: A Unified Approach to Cross-Lingual Transfer Learning for Low-Resource Languages

June 28, 2024

Trinh Pham*, Khoi M. Le*, Luu Anh Tuan

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

Related publications

Thank you!