The state-of-the-art approaches for image classification are based on neural networks. Mathematically, the task of classifying images is equivalent to finding the function that maps an image to the label it is associated with. To rigorously establish the success of neural network methods, we should first prove that the function has an efficient neural network representation, and then design provably efficient training algorithms to find such a representation. Here, we achieve the first goal based on a set of assumptions about the patterns in the images. The validity of these assumptions is very intuitive in many image classification problems, including but not limited to, recognizing handwritten digits.