SEMINAR

The lottery ticket hypothesis for gigantic pre-trained models

Speaker

Atlas Wang

Working
University of Texas at Austin
Timeline
Fri, May 28 2021 - 10:00 am (GMT + 7)
About Speaker

Professor Atlas Wang is currently an Assistant Professor of Electrical and Computer Engineering at UT Austin, leading the VITA research group (⁨https://vita-group.github.io/⁩). He is broadly interested in the fields of machine learning, computer vision, optimization, and their interdisciplinary applications. His latest interests focus on automated machine learning (AutoML), learning-based optimization, machine learning robustness, and efficient deep learning. He has received many research awards and scholarships, including most recently an ARO Young Investigator award, an IBM Faculty Research Award, an Amazon Research Award (AWS AI), an Adobe Data Science Research Award, and four research competition prizes from CVPR/ICCV/ECCV.

Abstract

In NLP and computer vision, enormous pre-trained models have become the standard starting point for training on a range of downstream tasks. In parallel, work on the lottery ticket hypothesis has shown that models contain smaller matching subnetworks capable of training in isolation to full accuracy and transferring to other tasks. We have combined these observations to assess whether such trainable, transferrable subnetworks exist in various pre-trained models. Taking BERT as one example, for a range of downstream tasks, we indeed find matching subnetworks at 40% to 90% sparsity. We find these subnetworks at (pre-trained) initialization, a deviation from prior NLP research where they emerge only after some amount of training. As another example found in computer vision, from all pre-trained weights obtained by ImageNet classification, simCLR and MoCo, we are also consistently able to locate matching subnetworks at 59.04% to 96.48% sparsity that transfer to multiple downstream tasks, whose performance also see no degradation compared to using full pre-trained weights. As large-scale pre-training becomes an increasingly central paradigm in deep learning, our results demonstrate that the main lottery ticket observations remain relevant in this context. A project webpage is available at: https://tianlong-chen.github.io/Project_Page/index.html

Related seminars

Tim Baldwin

MBZUAI, The University of Melbourne

Safe, open, locally-aligned language models
Mon, Dec 16 2024 - 02:00 pm (GMT + 7)

Alessio Del Bue

Italian Institute of Technology (IIT)

From Spatial AI to Embodied AI: The Path to Autonomous Systems
Mon, Dec 16 2024 - 10:00 am (GMT + 7)

Dr. Xiaoming Liu

Michigan State University

Person Recognition at a Distance
Mon, Dec 9 2024 - 10:00 am (GMT + 7)

Dr Lan Du

Monash University

Uncertainty Estimation for Multi-view/Multimodal Data
Fri, Dec 6 2024 - 10:00 am (GMT + 7)