Video Understanding: from Representation Learning to Open-World, Long-term Reasoning


Du Tran

Meta AI Research
Fri, Jul 22 2022 - 02:00 pm (GMT + 7)
About Speaker

Du Tran is a staff research scientist at Meta AI Research. He graduated with a Ph.D. in computer science from Dartmouth College and an M.S. in computer science from the University of Illinois at Urbana-Champaign, receiving the Dartmouth Presidential Fellowship and the Vietnam Education Fellowship. His research interests are in computer vision, machine learning, and computer graphics, with specific interests in video understanding, representation learning, and multimodal modeling.


Video understanding is one of the fundamental problems in computer vision with various applications, including autonomous vehicles, robot learning, and visual perception. Although we have witnessed multiple works in video understanding in the last few years, there are many more challenging video understanding problems that are still unsolved. In this talk, I will present some of our recent work in video understanding, including cross-modal self-supervised learning of video and audio representations and open-world instance segmentation. Finally, I will speculate on several potential future research directions in this area.

Related seminars

Anh Nguyen

Microsoft GenAI

The Revolution of Small Language Models
Fri, Mar 8 2024 - 02:30 pm (GMT + 7)

Thang D. Bui

Australian National University (ANU)

Recent Progress on Grokking and Probabilistic Federated Learning
Fri, Jan 26 2024 - 10:00 am (GMT + 7)

Tim Baldwin

MBZUAI, The University of Melbourne

Tue, Jan 9 2024 - 10:30 am (GMT + 7)

Quan Vuong

Google DeepMind

Scaling Robot Learning
Wed, Dec 27 2023 - 10:00 am (GMT + 7)