ML (Machine Learning) at Georgia Tech

ML@GT Seminar Series | A Picture of the Prediction Space of Deep Neural Networks

Featuring Pratik Chaudhari, University of Pennsylvania

Abstract: I will argue that deep networks work well because of a characteristic structure in the space of learnable tasks. The input correlation matrix for typical tasks has a “sloppy” eigenspectrum where eigenvalues decay linearly on a logarithmic scale. As a consequence, the Hessian and the Fisher Information Matrix of a trained network also have a sloppy eigenspectrum. Using this idea, I will demonstrate an analytical, non-vacuous PAC-Bayes bound on the generalization error for general deep networks.

I will show that the training process in deep learning explores a remarkably low dimensional manifold, as low as three. Networks with a wide variety of architectures, sizes, optimization and regularization methods lie on the same manifold. Networks being trained on different tasks (e.g., different subsets of ImageNet) using different methods (e.g., supervised, transfer, meta, semi and self-supervised learning) also lie on the same low-dimensional manifold.

I will show that typical tasks are highly redundant functions of their inputs. Many perception tasks, from visual recognition, semantic segmentation, optical flow, depth estimation, to vocalization discrimination, can be predicted extremely well regardless whether data is projected in the principal subspace where it varies the most, some intermediate subspace with moderate variability---or the bottom subspace where data varies the least.

References:

Does the data induce capacity control in deep learning? Rubing Yang, Jialin Mao, and Pratik Chaudhari. [ICML ’22] https://arxiv.org/abs/2110.14163
The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold. Jialin Mao, Itay Griniasty, Han Kheng Teoh, Rahul Ramesh, Rubing Yang, Mark K. Transtrum, James P. Sethna, Pratik Chaudhari. [PNAS 2024]. https://arxiv.org/abs/2305.01604
A picture of the space of typical learnable tasks. Rahul Ramesh, Jialin Mao, Itay Griniasty, Rubing Yang, Han Kheng Teoh, Mark Transtrum, James P. Sethna, and Pratik Chaudhari [ICML ’23]. https://arxiv.org/abs/2210.17011
Many Perception Tasks are Highly Redundant Functions of their Input Data. Rahul Ramesh, Anthony Bisulco, Ronald W. DiTullio, Linran Wei, Vijay Balasubramanian, Kostas Daniilidis, Pratik Chaudhari. https://arxiv.org/abs/2407.13841

Bio: Pratik Chaudhari is an Assistant Professor in Electrical and Systems Engineering and Computer and Information Science at the University of Pennsylvania. He is a core member of the GRASP Laboratory. From 2018-19, he was a Senior Applied Scientist at Amazon Web Services and a Postdoctoral Scholar in Computing and Mathematical Sciences at Caltech. Pratik received his PhD in Computer Science from UCLA, and his Master's and Engineer's degrees in Aeronautics and Astronautics from MIT. He was a part of NuTonomy Inc. (now Hyundai-Aptiv Motional) from 2014-16. He is the recipient of the Amazon Machine Learning Research Award, NSF CAREER award and the Intel Rising Star Faculty Award.

Georgia Institute of Technology

ML@GT

Search form

ML@GT Seminar Series | A Picture of the Prediction Space of Deep Neural Networks

Event Details

Related Media

For More Information Contact

Related Links

Georgia Tech Resources

Visitor Resources