Early fault diagnosis in complex mechanical systems such as gearbox has always been a great challenge, even with the recent development in deep neural networks. The performance of a classic fault diagnosis system predominantly depends on the features extracted and the classifier subsequently applied. Although a large number of attempts have been made regarding feature extraction techniques, the methods require great human involvements are heavily depend on domain expertise and may thus be non-representative and biased from application to application. On the other hand, while the deep neural networks based approaches feature adaptive feature extractions and inherent classifications, they usually require a substantial set of training data and thus hinder their usage for engineering applications with limited training data such as gearbox fault diagnosis. This paper develops a deep convolutional neural network-based transfer learning approach that not only entertains pre-processing free adaptive feature extractions, but also requires only a small set of training data. The proposed approach performs gear fault diagnosis using pre-processing free raw accelerometer data and experiments with various sizes of training data were conducted. The superiority of the proposed approach is revealed by comparing the performance with other methods such as locally trained convolution neural network and angle-frequency analysis based support vector machine. The achieved accuracy indicates that the proposed approach is not only viable and robust, but also has the potential to be readily applicable to other fault diagnosis practices.