SEMINAR

Variable Selection with Theoretical Guarantees on High-dimensional Data

Speaker

Binh Nguyen

Working
Telecom Paris
Timeline
Fri, Sep 30 2022 - 10:00 am (GMT + 7)
About Speaker

Binh Nguyen is a postdoctoral researcher at Telecom Paris, France. He obtained his doctoral degree in statistics in Département de Mathématiques d’Orsay and INRIA, and a master degree in Data Science at Paris-Saclay University. His research interest are in high-dimension statistics, optimization, and more recently the application of optimal transport to structured prediction problems in machine learning.

Abstract

In many scientific applications, increasingly bigger datasets are being acquired to describe more accurately biological or physical phenomena. While the dimensionality of the resulting measures has increased, the number of samples available is often limited, due to physical or financial limits. Performing statistical inference in such high-dimensional setting remains a hard problem that suffers from the curse of dimensionality. In this talk, we will first go through an introduction on the knockoff filters, a recent advance in multivariate analysis that controls the False Discovery Rate (FDR) with limited distribution assumptions. We then present a method for aggregating several samplings to address knockoff filter’s randomness, one of the its major limitation. We provide non-asymptotic theoretical results on the aggregated knockoff, specifically guaranteed FDR control, which relies on usage of concentration inequalities. Furthermore, we extend the method, providing a version that can scale to extremely high dimensional regime. One of the key steps is to use randomized clustering to reduce the dimension to avoid the curse of dimensionality, and then to ensemble several runs to tame the bias from the selection of a fixed clustering. We show that our algorithms perform reasonably well in practical applications from life-sciences, such as neuroscience, medical imaging and genomics.

Related seminars

Chenliang Xu

University of Rochester

Understand and Reconstruct Multimodal Egocentric Scenes
Tue, Jan 7 2025 - 10:00 am (GMT + 7)

Tim Baldwin

MBZUAI, The University of Melbourne

Safe, open, locally-aligned language models
Mon, Dec 16 2024 - 02:00 pm (GMT + 7)

Alessio Del Bue

Italian Institute of Technology (IIT)

From Spatial AI to Embodied AI: The Path to Autonomous Systems
Mon, Dec 16 2024 - 10:00 am (GMT + 7)

Dr. Xiaoming Liu

Michigan State University

Person Recognition at a Distance
Mon, Dec 9 2024 - 10:00 am (GMT + 7)