Video Understanding: from Representation Learning to Open-World, Long-term Reasoning


Du Tran

Meta AI Research
Fri, Jul 22 2022 - 02:00 pm (GMT + 7)
About Speaker

Du Tran is a staff research scientist at Meta AI Research. He graduated with a Ph.D. in computer science from Dartmouth College and an M.S. in computer science from the University of Illinois at Urbana-Champaign, receiving the Dartmouth Presidential Fellowship and the Vietnam Education Fellowship. His research interests are in computer vision, machine learning, and computer graphics, with specific interests in video understanding, representation learning, and multimodal modeling.


Video understanding is one of the fundamental problems in computer vision with various applications, including autonomous vehicles, robot learning, and visual perception. Although we have witnessed multiple works in video understanding in the last few years, there are many more challenging video understanding problems that are still unsolved. In this talk, I will present some of our recent work in video understanding, including cross-modal self-supervised learning of video and audio representations and open-world instance segmentation. Finally, I will speculate on several potential future research directions in this area.

Related seminars

Representation Learning with Graph Autoencoders and Applications to Music Recommendation
Fri, Jul 26 2024 - 10:00 am (GMT + 7)

Trieu Trinh

Google Deepmind

AlphaGeometry: Solving IMO Geometry without Human Demonstrations
Fri, Jul 5 2024 - 10:00 am (GMT + 7)

Tat-Jun (TJ) Chin

Adelaide University

Quantum Computing in Computer Vision: A Case Study in Robust Geometric Optimisation
Fri, Jun 7 2024 - 11:00 am (GMT + 7)

Fernando De la Torre

Carnegie Mellon University

Human Sensing for AR/VR
Wed, Apr 24 2024 - 07:00 am (GMT + 7)