Innovations in Text-Guided Visual Content Generation


Wang Hao

Nanyang Technological University
Mon, Jul 17 2023 - 11:00 am (GMT + 7)
About Speaker

WANG Hao is a final year PhD candidate in the School of Computer Science and Engineering at Nanyang Technological University, Singapore. He received the B.E. degree from Huazhong University of Science and Technology, China. His research interest is developing AI-powered perception and generation algorithms for the multimodal domain. In particular, his recent work investigates the translation between visual and text data, to generate controllable contents with efficiency and robustness. He has published first-authored top-tier conference and journal work in computer vision and multimedia fields, including CVPR, ECCV, IEEE TPAMI, IEEE TIP, etc.


Text-guided visual content generation is a significant task in generative AI, which focuses on translating semantic information from text to visual content. Generating complex and high-quality visuals while maintaining control is a key challenge in this domain. In this talk, we will introduce two innovative frameworks: StyleGAN-based inversion and online alignment. These frameworks aim to overcome the existing challenges, where we enable high-fidelity visual generation and cross-modal semantic matching simultaneously. With our approach, the inference phase allows for the direct generation of visual content from textual input, streamlining the process into a single step.

Related seminars

Anh Nguyen

Microsoft GenAI

The Revolution of Small Language Models
Fri, Mar 8 2024 - 02:30 pm (GMT + 7)

Thang D. Bui

Australian National University (ANU)

Recent Progress on Grokking and Probabilistic Federated Learning
Fri, Jan 26 2024 - 10:00 am (GMT + 7)

Tim Baldwin

MBZUAI, The University of Melbourne

Tue, Jan 9 2024 - 10:30 am (GMT + 7)

Quan Vuong

Google DeepMind

Scaling Robot Learning
Wed, Dec 27 2023 - 10:00 am (GMT + 7)