Seminar

TFML Talk: A mean-field view on transformer models

25/06/2025

_FF73622-2-pk - [Mento Guancia Sopracciglio]

Title

Speaker

Andrea Agazzi - Institute of Mathematical Statistics and Actuarial Science - Universität Bern

This talk is part of the TFML PhD School, but open to everyone interested. You can view the full program below

Abstract

Transformers are a central architecture in modern deep learning, forming the backbone of large language models such as ChatGPT. In this talk, I will present a mathematical framework for studying how information—represented as "tokens"—evolves through the layers of such neural networks. Specifically, we consider a family of partial differential equations that describe how the distribution of tokens—modeled as particles interacting in a mean-field way—changes with depth.

Numerical experiments reveal that, under certain conditions, these dynamics exhibit a metastable clustering phenomenon, where tokens group into well-separated clusters that evolve slowly over time. A rigorous analysis of this behavior uncovers a range of open questions and unexpected connections to various areas of mathematics.

Bio

Andrea leads the group of Stochastic Analysis and Applications in the Mathematics and Statistics Department at the University of Bern, where he serves as Associate Professor. Before moving to Bern, he was Assistant Professor (RTD/b) in the Mathematics Department at the University of Pisa. Previous to that, Andrea was Griffiths research Assistant Professor in the Math Department at Duke University. He obtained his PhD in Theoretical Physics, under the supervision of Jean-Pierre Eckmann, at the University of Geneva, after graduating, in physics, from Imperial College London and ETH Zurich.

When

Wednesday, June 25 2025, 14:00

Where

Room 509, UniGe DIBRIS/DIMA, Via Dodecaneso 35