MaLGa logoMaLGa black extendedMaLGa white extendedUniGe ¦ MaLGaUniGe ¦ MaLGaUniversita di Genova | MaLGaUniversita di GenovaUniGe ¦ EcoSystemics
Seminar

Operator World-models for Reinforcement Learning

17/12/2024

IMG_3976 - [Occhiali da vista Mento Occhiali]

Title

Operator World-models for Reinforcement Learning


Speaker

Carlo Ciliberto - University College London


Abstract

Policy Mirror Descent (PMD) is a powerful and theoretically sound methodology for sequential decision making. However, it is not directly applicable to Reinforcement Learning (RL) due to the inaccessibility of explicit action-value functions. We address this challenge by introducing a novel approach based on learning a world model of the environment using conditional mean embeddings. Leveraging tools from operator theory we derive a closed-form expression of the action-value function in terms of the world model via simple matrix operations. Combining these estimators with PMD leads to POWR, a new RL algorithm for which we prove convergence rates to the global optimum. Preliminary experiments in finite and infinite state settings support the effectiveness of our method.


Bio

Carlo Ciliberto is Associate Professor at University College London and Deputy Director of the AI Centre - UCL. He obtained his bachelor and master degrees in Mathematics at the Università Roma Treand a PhD in machine learning applied to robotics and computer vision at the University of Genova and Istituto Italiano di Tecnologia. He has been Postdoctoral Researcher at the Massachusetts Institute of Technology with the Center for Brain Minds and Machines and Lecturer (Assistant Professor) at Imperial College London before joining UCL, where he now carries out his main research activity. Carlo's research interests focus on foundational aspects of machine learning within the framework of statistical learning theory. He is particularly interested in the role of “structure” (being it in the form of prior knowledge or structural constraints) in reducing the sample complexity of learning algorithms with the goal of making them more sustainable both computationally and financially. He investigated these questions within the settings of structured prediction, multi-task and meta-learning, with applications to computer vision, robotics and recommendation systems.


When

Tuesday December 17th, 14:00


Where

Room 322, UniGe DIBRIS/DIMA, Via Dodecaneso 35