SEMINAR

Understand and Reconstruct Multimodal Egocentric Scenes

Speaker

Chenliang Xu

Working
University of Rochester
Timeline
Tue, Jan 7 2025 - 10:00 am (GMT + 7)
About Speaker

Chenliang Xu is an Associate Professor in the Department of Computer Science at the University of Rochester. He received his Ph.D. in Computer Science from the University of Michigan in 2016, an M.S. in Computer Science from the University at Buffalo in 2012, and a B.S. in Information and Computing Science from Nanjing University of Aeronautics and Astronautics, China, in 2010. His research originates in computer vision and tackles interdisciplinary topics, including video understanding, audio-visual learning, vision and language, and methods for trustworthy AI. Xu is a recipient of the James P. Wilmot Distinguished Professorship (2021), the University of Rochester Research Award (2021), the Best Paper Award Honorable Mention at the 17th Asian Conference on Computer Vision (2024), the Best Paper Award at the 17th ACM SIGGRAPH VRCAI Conference (2019), the Best Paper Award at the 14th Sound and Music Computing Conference (2017), and the University of Rochester AR/VR Pilot Award (2017). He has authored over 100 peer-reviewed papers in computer vision, machine learning, multimedia, and AI venues. He served as editor and area chair for various international journals and conferences

Abstract

Every day, the world generates numerous egocentric videos from mixed/augmented reality, lifelogging, and robotics. These videos are like humans looking at the world from an ego perspective, hence the name, egocentric videos. Understanding these videos and reconstructing egocentric scenes are essential to future AI applications. In this talk, I will first introduce two recent methods developed by my group on leveraging large language models (LLMs) to understand multimodal third-person and egocentric videos. These methods show incredible generalizability over traditional task-specific computer vision models. Following that, I will introduce methods leading to real-world audio-visual scene synthesis.

Related seminars

Tim Baldwin

MBZUAI, The University of Melbourne

Safe, open, locally-aligned language models
Mon, Dec 16 2024 - 02:00 pm (GMT + 7)

Alessio Del Bue

Italian Institute of Technology (IIT)

From Spatial AI to Embodied AI: The Path to Autonomous Systems
Mon, Dec 16 2024 - 10:00 am (GMT + 7)

Dr. Xiaoming Liu

Michigan State University

Person Recognition at a Distance
Mon, Dec 9 2024 - 10:00 am (GMT + 7)

Dr Lan Du

Monash University

Uncertainty Estimation for Multi-view/Multimodal Data
Fri, Dec 6 2024 - 10:00 am (GMT + 7)