Generic object detection is one of the most fundamental and important problems in computer vision. When it comes to large scale object detection for thousands of categories, it is unpractical to provide all the bounding box labels for each category. In this paper, we propose a novel hierarchical structure and joint training framework for large scale semi-supervised object detection. First, we utilize the relationships among target categories to model a hierarchical network to further improve the performance of recognition. Second, we combine bounding-box-level labeled images and image-level labeled images together for joint training, and the proposed method can be easily applied in current two-stage object detection framework with excellent performance. Experimental results show that the proposed large scale semi-supervised object detection network obtains the state-of-the-art performance, with the mAP of 38.1% on the ImageNet detection validation dataset.