Abstract:CNN is a powerful tool for many computer vision tasks, achieving much better result than traditional methods. Since CNN has a very large capacity, training such a neural network often requires many data, but it is often expensive to obtain labeled images in real practice, especially for object detection, where collecting bounding box of every object in training set requires many human efforts. This is the case in detection of retail products where there can be many different categories. In this paper, we focus on applying CNN to detect 324-categories products in situ, while requiring no extra effort of labeling bounding box for any image. Our approach is based on an algorithm that extracts bounding box from in-vitro dataset and an algorithm to simulate occlusion. We have successfully shown the effectiveness and usefulness of our methods to build up a Faster RCNN detection model. Similar idea is also applicable in other scenarios.