SEMINAR

Efficient Model Development in the Era of Large Language Models

Speaker

Dr. Tu Vu

Working
Virginia Tech
Timeline
Tue, Nov 5 2024 - 09:30 am (GMT + 7)
About Speaker

Dr. Tu Vu is an Assistant Professor in the Department of Computer Science at Virginia Tech. Prior to joining Virginia Tech, he held the position of Research Scientist at Google DeepMind for a year after receiving his PhD in Computer Science from the University of Massachusetts Amherst. Vu’s research aims to develop effective and efficient methods for advancing and democratizing artificial intelligence in the era of large language models. During his PhD, Vu worked as a student researcher at Google DeepMind, contributing to the development of models like Gemini and Flan-T5. His work on parameter-efficient transfer learning through soft prompts was widely adopted within Google for tuning large language models.

Abstract

The development of modern large language models often involves a post-training stage where a pre-trained model is fine-tuned on a large multitask mixture to improve various capabilities such as instruction following, reasoning, and coding. This raises two key research questions: 1) How can we determine the optimal dataset proportions for the mixture efficiently? and 2) How can we integrate new capabilities into the model without sacrificing performance on existing ones? In this talk, I will present two of my recent research papers that shed light on these questions. First, I will discuss our work on developing Foundational Large Auto-rater Models (FLAMe), which introduces a novel tail-patch fine-tuning approach to evaluate the impact of each dataset on targeted distributions. This approach enables us to determine the optimal dataset proportions, yielding competitive results compared to example-proportional mixing, while significantly reducing computational costs. Second, I will cover our latest work on large-scale model merging, which proposes a promising solution for efficiently incorporating new capabilities: as model capability and capacity grow, merging preserves most of the performance on existing capabilities while significantly enhancing zero-shot generalization to novel tasks, often matching or exceeding the performance of multitask training. Taken as a whole, I hope these results will spur more fundamental research into the efficient development of large language models.

Related seminars

Tim Baldwin

MBZUAI, The University of Melbourne

Safe, open, locally-aligned language models
Mon, Dec 16 2024 - 02:00 pm (GMT + 7)

Alessio Del Bue

Italian Institute of Technology (IIT)

From Spatial AI to Embodied AI: The Path to Autonomous Systems
Mon, Dec 16 2024 - 10:00 am (GMT + 7)

Dr. Xiaoming Liu

Michigan State University

Person Recognition at a Distance
Mon, Dec 9 2024 - 10:00 am (GMT + 7)

Dr Lan Du

Monash University

Uncertainty Estimation for Multi-view/Multimodal Data
Fri, Dec 6 2024 - 10:00 am (GMT + 7)