Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:On the Inductive Bias of a CNN for Orthogonal Patterns Distributions

Feb 22, 2020

Alon Brutzkus, Amir Globerson

Figure 1 for On the Inductive Bias of a CNN for Orthogonal Patterns Distributions

Figure 2 for On the Inductive Bias of a CNN for Orthogonal Patterns Distributions

Figure 3 for On the Inductive Bias of a CNN for Orthogonal Patterns Distributions

Share this with someone who'll enjoy it:

Abstract:Training overparameterized convolutional neural networks with gradient based methods is the most successful learning method for image classification. However, its theoretical properties are far from understood even for very simple learning tasks. In this work, we consider a simplified image classification task where images contain orthogonal patches and are learned with a 3-layer overparameterized convolutional network and stochastic gradient descent. We empirically identify a novel phenomenon where the dot-product between the learned pattern detectors and their detected patterns are governed by the pattern statistics in the training set. We call this phenomenon Pattern Statistics Inductive Bias (PSI) and prove that PSI holds for a simple setup with two points in the training set. Furthermore, we prove that if PSI holds, stochastic gradient descent has sample complexity $O(d^2\log(d))$ where $d$ is the filter dimension. In contrast, we show a VC dimension lower bound in our setting which is exponential in $d$. Taken together, our results provide strong evidence that PSI is a unique inductive bias of stochastic gradient descent, that guarantees good generalization properties.

View paper on

Share this with someone who'll enjoy it:

Title:On the Inductive Bias of a CNN for Orthogonal Patterns Distributions

Paper and Code