Release text translation model of VinAI Translate

July 28, 2022

VinAI is pleased to publicly release the pre-trained text translation models “vinai/vinai-translate-vi2en” and “vinai/vinai-translate-en2vi” that are currently used in the translation component of our VinAI Translate system. The pre-trained models are state-of-the-art text translation models for Vietnamese-to-English and English-to-Vietnamese, which can be used with the popular library “transformers”.

Please find details about the pre-trained models at: https://github.com/VinAIResearch/VinAI_Translate.

Experimental results of the pre-trained models can be found in our VinAI Translate system paper “A Vietnamese-English Neural Machine Translation System”, which will be presented at the Interspeech 2022 Show & Tell session.

Other NLP resources from VinAI:

BARTpho (INTERSPEECH 2022): Pre-trained sequence-to-sequence models for Vietnamese.
QA-CarManual (IUI 2022): Demo video of a Vietnamese speech-based question answering over car manuals.
PhoMT (EMNLP 2021): A high-quality and large-scale benchmark dataset for Vietnamese-English machine translation.
PhoATIS (INTERSPEECH 2021): An intent detection and slot filling dataset for Vietnamese.
PhoNLP (NAACL 2021): A BERT-based multi-task learning toolkit for Vietnamese POS tagging, named entity recognition and dependency parsing.
PhoNER_COVID19 (NAACL 2021): A dataset for Vietnamese named entity recognition.
ViText2SQL (EMNLP 2020 Findings): A dataset for Vietnamese Text2SQL semantic parsing.
PhoBERT (EMNLP 2020 Findings): Pre-trained language models for Vietnamese.
BERTweet (EMNLP 2020): A pre-trained language model for English Tweets.
COVID19Tweet (WNUT 2020): A dataset released for the WNUT 2020 Shared Task on “Identification of informative COVID-19 English Tweets”.