Abstract:One major impediment in rapidly deploying object detection models for industrial applications is the lack of large annotated datasets. We currently have presented the Sacked Carton Dataset(SCD) that contains carton images from three scenarios such as comprehensive pharmaceutical logistics company(CPLC), e-commerce logistics company(ECLC), fruit market(FM). However, due to domain shift, the model trained with carton datasets from one of the three scenarios in SCD has poor generalization ability when applied to the rest scenarios. To solve this problem, a novel image synthesis method is proposed to replace the foreground texture of the source datasets with the foreground instance texture of the target datasets. This method can greatly augment the target datasets and improve the model's performance. We firstly propose a surfaces segmentation algorithm to identify the different surfaces of the carton instance. Secondly, a contour reconstruction algorithm is proposed to solve the problem of occlusion, truncation, and incomplete contour of carton instances. Finally, the Gaussian fusion algorithm is used to fuse the background from the source datasets with the foreground from the target datasets. The novel image synthesis method can largely boost AP by at least $4.3\%\sim6.5\%$ on RetinaNet and $3.4\%\sim6.8\%$ on Faster R-CNN for the target domain. And on the source domain, the performance AP can be improved by $1.7\%\sim2\%$ on RetinaNet and $0.9\%\sim1.5\%$ on Faster R-CNN. Code is available at https://github.com/hustgetlijun/RCAN.
Abstract:For most of the anchor-based detectors, Intersection over Union(IoU) is widely utilized to assign targets for the anchors during training. However, IoU pays insufficient attention to the closeness of the anchor's center to the truth box's center. This results in two problems: (1) only one anchor is assigned to most of the slender objects which leads to insufficient supervision information for the slender objects during training and the performance on the slender objects is hurt; (2) IoU can not accurately represent the alignment degree between the receptive field of the feature at the anchor's center and the object. Thus during training, some features whose receptive field aligns better with objects are missing while some features whose receptive field aligns worse with objects are adopted. This hurts the localization accuracy of models. To solve these problems, we firstly design Gaussian Guided IoU(GGIoU) which focuses more attention on the closeness of the anchor's center to the truth box's center. Then we propose GGIoU-balanced learning method including GGIoU-guided assignment strategy and GGIoU-balanced localization loss. The method can assign multiple anchors for each slender object and bias the training process to the features well-aligned with objects. Extensive experiments on the popular benchmarks such as PASCAL VOC and MS COCO demonstrate GGIoU-balanced learning can solve the above problems and substantially improve the performance of the object detection model, especially in the localization accuracy.
Abstract:Carton detection is an important technique in the automatic logistics system and can be applied to many applications such as the stacking and unstacking of cartons, the unloading of cartons in the containers. However, there is no public large-scale carton dataset for the research community to train and evaluate the carton detection models up to now, which hinders the development of carton detection. In this paper, we present a large-scale carton dataset named Stacked Carton Dataset(SCD) with the goal of advancing the state-of-the-art in carton detection. Images are collected from the internet and several warehourses, and objects are labeled using per-instance segmentation for precise localization. There are totally 250,000 instance masks from 16,136 images. In addition, we design a carton detector based on RetinaNet by embedding Offset Prediction between Classification and Localization module(OPCL) and Boundary Guided Supervision module(BGS). OPCL alleviates the imbalance problem between classification and localization quality which boosts AP by 3.1% - 4.7% on SCD while BGS guides the detector to pay more attention to boundary information of cartons and decouple repeated carton textures. To demonstrate the generalization of OPCL to other datasets, we conduct extensive experiments on MS COCO and PASCAL VOC. The improvement of AP on MS COCO and PASCAL VOC is 1.8% - 2.2% and 3.4% - 4.3% respectively.
Abstract:Due to the simpleness and high efficiency, single-stage object detectors have been widely applied in many computer vision applications . However, the low correlation between the classification score and localization accuracy of the predicted detections has severely hurt the localization accuracy of models. In this paper, IoU-aware single-stage object detector is proposed to solve this problem. Specifically, IoU-aware single-stage object detector predicts the IoU for each detected box. Then the classification score and predicted IoU are multiplied to compute the final detection confidence, which is more correlated with the localization accuracy. The detection confidence is then used as the input of the subsequent NMS and COCO AP computation, which will substantially improve the localization accuracy of models. Sufficient experiments on COCO and PASCAL VOC datasets demonstrate the effectiveness of IoU-aware single-stage object detector on improving model's localization accuracy. Without whistles and bells, the proposed method can substantially improve AP by $1.7\%\sim1.9\%$ and AP75 by $2.2\%\sim2.5\%$ on COCO \textit{test-dev}. On PASCAL VOC, the proposed method can substantially improve AP by $2.9\%\sim4.4\%$ and AP80, AP90 by $4.6\%\sim10.2\%$. The source code will be made publicly available.
Abstract:Single-stage detectors are efficient. However, we find that the loss functions adopted by single-stage detectors are sub-optimal for accurate localization. The standard cross entropy loss for classification is independent of localization task and drives all the positive examples to learn as high classification score as possible regardless of localization accuracy during training. As a result, there will be detections that have high classification score but low IoU or low classification score but high IoU. And the detections with low classification score but high IOU will be suppressed by the ones with high classification score but low IOU during NMS, hurting the localization accuracy. For the standard smooth L1 loss, the gradient is dominated by the outliers that have poorly localization accuracy and this is harmful for accurate localization. In this work, we propose IoU-balanced loss functions that consist of IoU-balanced classification loss and IoU-balanced localization loss to solve the above problems. The IoU-balanced classification loss focuses more attention on positive examples with high IOU and can enhance the correlation between classification and localization task. The IoU-balanced localization loss decreases the gradient of the examples with low IoU and increases the gradient of examples with high IoU, which can improve the localization accuracy of models. Sufficient studies on MS COCO demonstrate that both IoU-balanced classification loss and IoU-balanced localization loss can bring substantial improvement for the single-stage detectors. Without whistles and bells, the proposed methods can improve AP by 1.1% for single-stage detectors and the improvement for AP at higher IoU threshold is especially large, such as 2.3% for AP90. The source code will be made available.