GenAI
NLP
InterSpeech

FlowVocoder: A small footprint vocoder based normalizing-flow for speech synthesis

June 16, 2022

Recently, autoregressive neural vocoders have provided remarkable performance in generating high-fidelity speech and have been able to produce synthetic speech in real-time. However, autoregressive neural vocoders such as WaveFlow are capable of modeling waveform signals from mel-spectrogram, its number of parameters is significant to deploy on edge devices. Though NanoFlow, which has a small number of parameters, is a state-of-the-art autoregressive neural vocoder, the performance of NanoFlow is marginally lower than WaveFlow. Therefore, we propose a new type of autoregressive neural vocoder called FlowVocoder, which has a small memory footprint and is capable of generating high-fidelity audio in real-time. Our proposed model improves the density estimation of flow blocks by utilizing a mixture of Cumulative Distribution Functions (CDF) for bipartite transformation. Hence, the proposed model is capable of modeling waveform signals, while its memory footprint is much smaller than WaveFlow. As shown in experiments, FlowVocoder achieves competitive results with baseline methods in terms of both subjective and objective evaluation, also, it is more suitable for real-time text-to-speech applications.

Overall

< 1 minute

Manh Luong and Viet Anh Tran

InterSpeech 2022

Share Article

Related publications

GenAI
ML
ICML Top Tier
May 14, 2024

Bao Nguyen, Binh Nguyen, Hieu Nguyen, Viet Anh Nguyen

GenAI
NLP
NAACL Top Tier
April 4, 2024

Thang Le, Tuan Luu

GenAI
CV
CVPR Top Tier
March 4, 2024

Thuan Hoang Nguyen, Anh Tran

GenAI
ICLR – Tiny Papers Track
February 14, 2024

Thanh-Thien Le, Linh The Nguyen, Dat Quoc Nguyen