MaLGa logoMaLGa black extendedMaLGa white extendedUniGe ¦ MaLGaUniGe ¦ MaLGaUniversita di Genova | MaLGaUniversita di GenovaUniGe ¦ EcoSystemics
Seminar

MaLGa Colloquia - Implicit bias of gradient descent in deep learning

23/03/2026

Title

MaLGa Colloquia - Implicit bias of gradient descent in deep learning


Speaker

Holger Rauhut - LMU Munich (DE)


Abstract

Deep neural networks are usually trained by minimizing a non-convex loss functional via (stochastic) gradient descent methods. In the overparameterized scenario where there are more parameters than training data it is observed empirically that the training loss is commonly driven to zero by gradient descent methods, i.e., the neural network interpolates the data exactly. It is puzzling that at the same the learned network often generalize well to unseen data. This in in stark contrast to intuition from classical statistics which would predict overfitting. A current working hypothesis is that the chosen optimization algorithm has a significant influence on the selection of the learned network. In fact, in the overparameterized context there are many global minimizers so that the optimization method induces an implicit bias on the computed solution. It seems that gradient descent methods and their stochastic variants favor networks of low complexity (in a suitable sense to be understood), and, hence, appear to be very well suited for large classes of real data. Initial attempts in understanding the implicit bias phenomen considers the simplified setting of linear networks, i.e., (deep) factorizations of matrices. This has revealed a surprising relation to the field of sparse and low rank recovery (compressive sensing) in the sense that gradient descent favors sparse diagonal or low rank matrices in certain situations. Moreover, initial results on learning two-layer ReLU networks show that sparse ReLU-expansions may be favored by gradient flow.

Despite such initial theoretical results on simplified scenarios, the understanding of the implicit bias phenomenon in deep learning is widely open.

Based on joint works with El Mehdi Achour, Wiebe Bartolomaeus, Hung-Hsu Chou, Johannes Maly, Maria Matveev, Rachel Ward


Bio

Studies of Mathematics (Diplom) at Technical University of Munich (1996-2001)

Doctorate in Mathematics at Technical University of Munich (2004)

PostDoc positions at University of Wroclaw (2005) and University of Vienna (2005-2008)

Habilitation in Mathematics at University of Vienna (2008)

Professor of Mathematics (Bonn Junior Fellow), University of Bonn (2008-2013)

Professor of Mathematics, Chair of Mathematics of Information Processing, RWTH Aachen University (2013-2023)

Professor of Mathematics, Chair of Mathematics of Information Processing, Ludwig-Maximilians-University Munich


ERC Starting Grant Sparse and Low Rank Recovery (2011-2015)

Spokesperson Collaborative Research Center (SFB1481), Sparsity and Singular Structures, RWTH Aachen University (2022-2023)


Research Interests

Mathematical Foundations of Machine Learning and of Signal Processing

Compressive Sensing, Optimization, Probability in High Dimensions, Harmonic Analysis


When

Monday, March 23rd, 16:00


Where

Room 706, UniGe DIBRIS/DIMA, Via Dodecaneso 35