In this paper, we study the sample complexity lower bound of a $d$-layer feed-forward, fully-connected neural network for binary classification, using information-theoretic tools. Specifically, we propose a backward data generating process, where the input is generated based on the binary output, and the network is parametrized by weight parameters for the hidden layers. The sample complexity lower bound is of order $\Omega(\log(r) + p / (r d))$, where $p$ is the dimension of the input, $r$ is the rank of the weight matrices, and $d$ is the number of hidden layers. To the best of our knowledge, our result is the first information theoretic sample complexity lower bound.