TFML Talk: A mean-field view on transformer models
Title
TFML Talk: A mean-field view on transformer models
Speaker
Andrea Agazzi - Institute of Mathematical Statistics and Actuarial Science - Universität Bern
This talk is part of the TFML PhD School, but open to everyone interested. You can view the full program below
Abstract
Transformers are a central architecture in modern deep learning, forming the backbone of large language models such as ChatGPT. In this talk, I will present a mathematical framework for studying how information—represented as "tokens"—evolves through the layers of such neural networks. Specifically, we consider a family of partial differential equations that describe how the distribution of tokens—modeled as particles interacting in a mean-field way—changes with depth.
Numerical experiments reveal that, under certain conditions, these dynamics exhibit a metastable clustering phenomenon, where tokens group into well-separated clusters that evolve slowly over time. A rigorous analysis of this behavior uncovers a range of open questions and unexpected connections to various areas of mathematics.
Bio
Andrea leads the group of Stochastic Analysis and Applications in the Mathematics and Statistics Department at the University of Bern, where he serves as Associate Professor. Before moving to Bern, he was Assistant Professor (RTD/b) in the Mathematics Department at the University of Pisa. Previous to that, Andrea was Griffiths research Assistant Professor in the Math Department at Duke University. He obtained his PhD in Theoretical Physics, under the supervision of Jean-Pierre Eckmann, at the University of Geneva, after graduating, in physics, from Imperial College London and ETH Zurich.
When
Wednesday, June 25 2025, 14:00
Where
Room 509, UniGe DIBRIS/DIMA, Via Dodecaneso 35