Faster saddle-point optimization for solving large-scale Markov decision processes


Joan Bas Serrano

Universitat Pompeu Fabra
Fri, Feb 21 2020 - 10:00 am (GMT + 7)
About Speaker

Joan Bas Serrano is a PhD student and teaching assistant in the Artificial Intelligence and Machine Learning group at Universitat Pompeu Fabra (UPF). Under the supervision of Gergely Neu, he’s currently working on theoretical reinforcement learning. He is mainly interested in the development of reinforcement learning algorithms with a solid mathematical basis and strong theoretical guarantees. His current research is based on the linear programming formulation of MDPs and the saddle-point optimization theory.

He obtained his bachelor’s in Physics at Universitat Autònoma de Barcelona in 2016 and his M.Sc. at UPF in 2017, cursing the program “Master in Intelligent and Interactive Systems”. He did his master thesis titled “ A generative model of user activity in the Integrated Learning Design Environment“ under the advice of Vicenç Gomez and Davinia Hernandez-Leo.


We consider the problem of computing optimal policies in average-reward Markov decision processes. This classical problem can be formulated as a linear program directly amenable to saddle-point optimization methods, albeit with a number of variables that is linear in the number of states. To address this issue, recent work has considered a linearly relaxed version of the resulting saddle-point problem. Our work aims at achieving a better understanding of this relaxed optimization problem by characterizing the conditions necessary for convergence to the optimal policy, and designing an optimization algorithm enjoying fast convergence rates that are independent of the size of the state space. Notably, our characterization points out some potential issues with previous work.

Related seminars

Anh Nguyen

Microsoft GenAI

The Revolution of Small Language Models
Fri, Mar 8 2024 - 02:30 pm (GMT + 7)

Thang D. Bui

Australian National University (ANU)

Recent Progress on Grokking and Probabilistic Federated Learning
Fri, Jan 26 2024 - 10:00 am (GMT + 7)

Tim Baldwin

MBZUAI, The University of Melbourne

Tue, Jan 9 2024 - 10:30 am (GMT + 7)

Quan Vuong

Google DeepMind

Scaling Robot Learning
Wed, Dec 27 2023 - 10:00 am (GMT + 7)