Deep learning models based on CNNs are predominantly used in image classification tasks. Such approaches, assuming independence of object categories, normally use a CNN as a feature learner and apply a flat classifier on top of it. Object classes in many settings have hierarchical relations, and classifiers exploiting these relations should perform better. We propose hierarchical classification models combining a CNN to extract hierarchical representations of images, and an RNN or sequence-to-sequence model to capture a hierarchical tree of classes. In addition, we apply residual learning to the RNN part in oder to facilitate training our compound model and improve generalization of the model. Experimental results on a real world proprietary dataset of images show that our hierarchical networks perform better than state-of-the-art CNNs.