Natural Language Processing

We aim to conduct cutting-edge research and become a local hub in Asia in natural language processing (NLP) and language technology. Geographically, we are naturally drawn towards language problems and challenges in the region which might otherwise be overlooked in the research community. Our goal is to not only create new tools and knowledge agonistic to low-resource languages in the region, but also to practically create the very best NLP technology for Vietnamese. Consequently, we are pushing new state-of-the-arts in low-resource language problems, language modeling and translation, conversational AI, information extraction, and the like.

New technology requires new fundamental research. To this end, our team collaborates with the Machine Learning team to work on foundations of machine learning for NLP such as self-supervised learning, adversarial learning, multi-task learning, graph neural networks and knowledge graph, and also collaborates with the Computer Vision team for multimodal research in vision and language.

The NLP team has helped boost the global visibility of VinAI by establishing a strong collaborator network with prominent researchers all over the world, for example, from the University of Oregon in the USA, Nanyang Technological University in Singapore, the University of Melbourne and Monash University in Australia. We achieved substantial research outputs with papers published at top-tier NLP/AI conferences, under a wide range of, but not limited to, the following topics:
- Foundation models and Large language models
- Text classification and summarization
- Text and speech translation
- Question answering and dialogue systems
- Spoken language understanding
- Tagging, syntactic and semantic parsing
- Relation and event extraction
- Knowledge graph embedding
- Language grounding to vision
- Resources and evaluation

NLP

Findings of ACL

Retrieving Relevant Context to Align Representations for Cross-lingual Event Detection

We study the problem of cross-lingual transfer learning for event detection (ED) where…

NLP

InterSpeech Top Tier

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

We present XPhoneBERT, the first multilingual model pre-trained to learn phoneme representations for…

NLP

CIKM

A Capsule Network-based Model for Learning Node Embeddings

In this paper, we focus on learning low-dimensional embeddings for nodes in graph-structured…

Related publications

GenAI

NLP

LREC-COLING

Improving Vietnamese-English Medical Machine Translation

June 28, 2024

Nhu Vo, Dat Quoc Nguyen, Dung D. Le, Massimo Piccardi, Wray Buntine

NLP

LREC-COLING

BKEE: Pioneering Event Extraction in the Vietnamese Language

June 28, 2024

Thi-Nhung Nguyen, Bang Tien Tran, Trong-Nghia Luu, Thien Huu Nguyen, Kiem-Hieu Nguyen

GenAI

NLP

Findings of ACL

Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs

June 28, 2024

Minh-Vuong Nguyen, Linhao Luo, Fatemeh Shiri, Dinh Phung, Yuan-Fang Li, Thuy-Trang Vu, Gholamreza Haffari

GenAI

NLP

Findings of ACL

Realistic Evaluation of Toxicity in Large Language Models

June 28, 2024

Tinh Son Luong, Thanh-Thien Le, Linh Van Ngo, and Thien Huu Nguyen

GenAI

NLP

ACL Top Tier

UniBridge: A Unified Approach to Cross-Lingual Transfer Learning for Low-Resource Languages

June 28, 2024

Trinh Pham*, Khoi M. Le*, Luu Anh Tuan

VinAI Translate

Do not miss these Seminars & Workshops

Jey Han Lau

University of Melbourne

Rumour and Disinformation Detection in Online Conversations

Thu, Sep 14 2023 - 10:00 am (GMT + 7)

Tim Baldwin

Mohamed bin Zayed University of Artificial Intelligence

Fairness in Natural Language Processing

Tue, Dec 20 2022 - 02:00 pm (GMT + 7)

Anh Tuan Luu

VinAI Research

Towards Robustness Against Natural Language Adversarial Attacks

Fri, Aug 14 2020 - 03:00 pm (GMT + 7)

Released Source Codes

NO	Code	Paper	Conference	Year
01.	3D-UCaps 58 11	3D-UCaps: 3D Capsules Unet for Volumetric Image Segmentation	MICCAI	2021
02.	BARTpho 88 7	BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese	InterSpeech	2021
03.	Blur-kernel-space-exploring 125 33	Explore Image Deblurring via Blur Kernel Space	CVPR	2021