Hao Phung*, Quan Dao*, Trung Dao, Viet Hoang Phan, Dimitris N. Metaxas, Anh Tran
Computer Vision
Our research group aims to develop translational research and products that enhance millions of people’s lives. Noticing immeasurable real-life problems relating to image, video, and sensory data, we push advancing research on computer vision. Humans are at the crux of our research, epitomized by a wide range of research topics such as face recognition and manipulation, eye gaze prediction, hand gesture recognition, and human behavior understanding. Another complementary strand is to understand the formulation of real-world imagery data, rebuild, and manipulate them, realized by Generative AI research.
To make computer vision algorithms work in real-life scenarios, we identify practical challenges, including data scarcity and data quality degradation, and resolve them via advanced technologies in Few-shot learning and Image/Video Enhancement. We do not limit our research to imagery data but extend our research to cover other sensory data, such as 3D point-cloud, as well as combining vision with other modalities like languages. Our computer vision research, therefore, supplies impactful research and products to enhance human life such as smart mobility and smart surveillance systems, deployed on thousands of smart cars and smart cameras in Vietnam.
The Computer Vision team has helped boost the global visibility of VinAI by establishing a strong collaborator network with prominent researchers all over the world. We achieved substantial research outputs at top-tier AI venues, under a wide range of, but not limited to, the following topics:
- Face recognition and analyses
- Human activity understanding
- Image generation and manipulation
- Few-shot learning
- Image/Video enhancement
- 3D Vision
- Vision and language
- Trustworthy Computer Vision
Related publications
Released Source Codes
NO |
Code |
Paper |
Conference |
Year |
---|---|---|---|---|
01. |
Anti-DreamBooth
|
Anti-DreamBooth: Protecting users from personalized text-to-image synthesis | ICCV | 2023 |
02. |
BERTweet
|
BERTweet: A pre-trained language model for English Tweets | EMNLP | 2020 |
03. |
BARTpho
|
BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese | InterSpeech | 2021 |