Abstract:In object detection, determining which anchors to assign as positive or negative samples, known as anchor assignment, has been revealed as a core procedure that can significantly affect a model's performance. In this paper we propose a novel anchor assignment strategy that adaptively separates anchors into positive and negative samples for a ground truth bounding box according to the model's learning status such that it is able to reason about the separation in a probabilistic manner. To do so we first calculate the scores of anchors conditioned on the model and fit a probability distribution to these scores. The model is then trained with anchors separated into positive and negative samples according to their probabilities. Moreover, we investigate the gap between the training and testing objectives and propose to predict the Intersection-over-Unions of detected boxes as a measure of localization quality to reduce the discrepancy. The combined score of classification and localization qualities serving as a box selection metric in non-maximum suppression well aligns with the proposed anchor assignment strategy and leads significant performance improvements. The proposed methods only add a single convolutional layer to RetinaNet baseline and does not require multiple anchors per location, so are efficient. Experimental results verify the effectiveness of the proposed methods. Especially, our models set new records for single-stage detectors on MS COCO test-dev dataset with various backbones. Code is available at https://github.com/kkhoot/PAA.
Abstract:We propose a novel traffic sign detection system that simultaneously estimates the location and precise boundary of traffic signs using convolutional neural network (CNN). Estimating the precise boundary of traffic signs is important in navigation systems for intelligent vehicles where traffic signs can be used as 3D landmarks for road environment. Previous traffic sign detection systems, including recent methods based on CNN, only provide bounding boxes of traffic signs as output, and thus requires additional processes such as contour estimation or image segmentation to obtain the precise sign boundary. In this work, the boundary estimation of traffic signs is formulated as a 2D pose and shape class prediction problem, and this is effectively solved by a single CNN. With the predicted 2D pose and the shape class of a target traffic sign in an input image, we estimate the actual boundary of the target sign by projecting the boundary of a corresponding template sign image into the input image plane. By formulating the boundary estimation problem as a CNN-based pose and shape prediction task, our method is end-to-end trainable, and more robust to occlusion and small targets than other boundary estimation methods that rely on contour estimation or image segmentation. The proposed method with architectural optimization provides an accurate traffic sign boundary estimation which is also efficient in compute, showing a detection frame rate higher than 7 frames per second on low-power mobile platforms.