Convergence and optimality of wide RNNs in the mean-field regime
Title
Convergence and optimality of wide RNNs in the mean-field regime
Speaker
Andrea Agazzi - Università di Pisa
Abstract
Recurrent neural networks (RNNs) are a family of neural network architectures that is traditionally used to learn from data with a time-series structure. As the name suggests, these networks have a recurrent structure, i.e., for each timestep of the predictor the (hidden) state of the network is fed back to the model as an input, allowing it to maintain a "memory" of past inputs. In this talk, we extend a series of results on the training of wide neural networks in the so-called "mean-field" regime to the RNN structure. More specifically, we prove that the gradient descent training dynamics of Elman-type RNNs converge in an appropriate sense, as the width of the network diverges, to a set of "mean-field" ODEs. Furthermore we prove that, under some conditions on the data and the initialization of the network, the fixed points of such limiting, "mean-field" dynamics are globally optimal. This is joint work with Jianfeng Lu and Sayan Mukherjee.
Bio
Andrea Agazzi is Assistant Professor in the Mathematics Department at the University of Pisa. After a PhD in theoretical physics at the University of Geneva, he was Griffith Research Assistant Professor in the Mathematics Department at Duke University. His interests space broadly between probability theory, stochastic analysis and their applications, in particular to problems in deep learning theory.
When
May 3rd 2023, 15:00
Where
Room 322, UniGe DIBRIS, Via Dodecaneso 35
The seminar will be streamed online on Teams, details below.
Meeting ID: 360 543 033 187
Passcode: PiwCVp