Recent studies on multi-label image classification have been focusing on designing more complex architectures of deep neural networks such as the use of attention mechanism and region proposal networks. Although performance gains have been reported in literature, the backbone deep models of the proposed approaches and the evaluation metrics employed in different works vary, making it difficult to compare with each other fairly. Moreover, due to the lack of properly investigated baselines, the advantage introduced by the proposed techniques in literature are vague. To address these issues, we make a thorough investigation of the mainstream deep convolutional neural network architectures for multi-label image classification and present a strong baseline. With only data augmentation and model ensemble, we achieve better performance than those previously reported on three benchmark datasets. We hope the work presented in this paper will provide insights to the future studies on multi-label image classification.