On the Existence and on the Role of Wide Flat Minima in Deep Learning
Riccardo Zecchina - Bocconi University
In this talk we will try to answer to the following question: Were does the propensity to learn efficiently and to generalize well come from in large scale artificial neural networks (ANN)? These two properties, which in principle are independent and often in competition, are known to coexist in deep learning and yet a unifying theoretical framework is missing. We discuss two fundamental features which could represent the building block of such a theory: we first show that ANN possess the peculiar structural property of having very wide flat minima in their weight space which are crucial to achieve good generalization performance and avoid overfitting. These regions are rare and coexist with narrow minima and saddles. Second, we show that the “deep learning” algorithms which have been developed in the last decade do in fact target those rare regions.
The research interests of Riccardo Zecchina (RZ) lie at the interface between statistical physics, computer science and information theory. His current research interests are focused on machine learning and optimization. RZ obtained the master’s degree in Electronic Engineering from the Politecnico di Torino in 1989 and the PhD in theoretical Physics from the University of Torino in 1993. He has been research scientist and head of the Statistical Physics Group at the International Centre for Theoretical Physics in Trieste (Italy) between 1997 and 2007. In 2007 he moved as full professor to the Politecnico di Torino University. He has been visiting scientist at Microsoft Research (Redmond and Boston) and visiting professor at the University of Orsay. Since 2017 he is full professor at the Bocconi University in Milan, with a chair in Machine Learning. A new lab on Artificial Intelligence has been created in 2019 (www.artlab.unibocconi.it). The papers of RZ can be found on the ArXiv or on Scholar. They have been published on multidisciplinary scientific journals such as Nature, Science, PNAS, Physical Review Letters or on specialized journals in theoretical physics, computer science and applied mathematics. International Awards: - 2016, Lars Onsager prize in Theoretical Statistical Physics by the American Physical Society for the design of new classes of efficient algorithms and for the study of phase transitions in optimization problems. - 2011 Advanced grantee of the European Research Council (ERC) for the project “Optimization and inference algorithms from the theory of disordered systems” 2010-2015.
2019-02-05 at 2:30 pm