Learning/Optimization over abstract non-manifold structures: A case in biological data analysis
Thien Le
Thien Le is a second year PhD student at Massachusetts Institute of Technology (MIT) Computer Science and Artificial Intelligence Laboratory (CSAIL) with Stefanie Jegelka group. He is interested broadly in theory of optimization over complex structures that arise in biological data and discrete/continuous algorithms used to address them. Prior to coming to MIT, he completed his undergraduate degree at the University of Illinois at Urbana-Champaign (UIUC) in May 2019, where he worked with Tandy Warnow on phylogeny research. He received the Most Outstanding Major Award in Mathematics and Computer Science from UIUC Department of Mathematics in April 2019.
Biological data analysis often involves optimization problems where the objective function is efficient to compute, but the feasible set is of complex combinatorial and geometric nature. In particular, phylogenetic studies involve inferring tree structures that depict evolutionary relationships, from information at their leaf nodes (DNA sequences of existing species, for example). One approach that works well in theory and practice models evolution as a parametric stochastic process and performs maximum likelihood analysis. While the likelihood function is easy to compute, the set of all tree structures over a fix number of leaves has high dimension and is, in fact, a non-manifold. Despite these obstacles, there are known discrete algorithms that solve this problem with efficient runtime and sample complexity. In this talk, we visit known results on the algebro-geometric properties of this space and expand on how smooth optimization can be a novel tool to design even more efficient algorithms/heuristics for these problems. Based on on-going work with Stefanie Jegelka.