Abstract:The ambiguous appearance, tiny scale, and fine-grained classes of objects in remote sensing imagery inevitably lead to the noisy annotations in category labels of detection dataset. However, the effects and treatments of the label noises are underexplored in modern oriented remote sensing object detectors. To address this issue, we propose a robust oriented remote sensing object detection method through dynamic loss decay (DLD) mechanism, inspired by the two phase ``early-learning'' and ``memorization'' learning dynamics of deep neural networks on clean and noisy samples. To be specific, we first observe the end point of early learning phase termed as EL, after which the models begin to memorize the false labels that significantly degrade the detection accuracy. Secondly, under the guidance of the training indicator, the losses of each sample are ranked in descending order, and we adaptively decay the losses of the top K largest ones (bad samples) in the following epochs. Because these large losses are of high confidence to be calculated with wrong labels. Experimental results show that the method achieves excellent noise resistance performance tested on multiple public datasets such as HRSC2016 and DOTA-v1.0/v2.0 with synthetic category label noise. Our solution also has won the 2st place in the "fine-grained object detection based on sub-meter remote sensing imagery" track with noisy labels of 2023 National Big Data and Computing Intelligence Challenge.
Abstract:The diversity of building architecture styles of global cities situated on various landforms, the degraded optical imagery affected by clouds and shadows, and the significant inter-class imbalance of roof types pose challenges for designing a robust and accurate building roof instance segmentor. To address these issues, we propose an effective framework to fulfill semantic interpretation of individual buildings with high-resolution optical satellite imagery. Specifically, the leveraged domain adapted pretraining strategy and composite dual-backbone greatly facilitates the discriminative feature learning. Moreover, new data augmentation pipeline, stochastic weight averaging (SWA) training and instance segmentation based model ensemble in testing are utilized to acquire additional performance boost. Experiment results show that our approach ranks in the first place of the 2023 IEEE GRSS Data Fusion Contest (DFC) Track 1 test phase ($mAP_{50}$:50.6\%). Note-worthily, we have also explored the potential of multimodal data fusion with both optical satellite imagery and SAR data.
Abstract:Unifying the correlative single-view satellite image building extraction and height estimation tasks indicates a promising way to share representations and acquire generalist model for large-scale urban 3D reconstruction. However, the common spatial misalignment between building footprints and stereo-reconstructed nDSM height labels incurs degraded performance on both tasks. To address this issue, we propose a Height-hierarchy Guided Dual-decoder Network (HGDNet) to estimate building height. Under the guidance of synthesized discrete height-hierarchy nDSM, auxiliary height-hierarchical building extraction branch enhance the height estimation branch with implicit constraints, yielding an accuracy improvement of more than 6% on the DFC 2023 track2 dataset. Additional two-stage cascade architecture is adopted to achieve more accurate building extraction. Experiments on the DFC 2023 Track 2 dataset shows the superiority of the proposed method in building height estimation ({\delta}1:0.8012), instance extraction (AP50:0.7730), and the final average score 0.7871 ranks in the first place in test phase.
Abstract:The Guided Filter (GF) is well-known for its linear complexity. However, when filtering an image with an n-channel guidance, GF needs to invert an n x n matrix for each pixel. To the best of our knowledge existing matrix inverse algorithms are inefficient on current hardwares. This shortcoming limits applications of multichannel guidance in computation intensive system such as multi-label system. We need a new GF-like filter that can perform fast multichannel image guided filtering. Since the optimal linear complexity of GF cannot be minimized further, the only way thus is to bring all potentialities of current parallel computing hardwares into full play. In this paper we propose a hardware-efficient Guided Filter (HGF), which solves the efficiency problem of multichannel guided image filtering and yields competent results when applying it to multi-label problems with synthesized polynomial multichannel guidance. Specifically, in order to boost the filtering performance, HGF takes a new matrix inverse algorithm which only involves two hardware-efficient operations: element-wise arithmetic calculations and box filtering. In order to break the linear model restriction, HGF synthesizes a polynomial multichannel guidance to introduce nonlinearity. Benefiting from our polynomial guidance and hardware-efficient matrix inverse algorithm, HGF not only is more sensitive to the underlying structure of guidance but also achieves the fastest computing speed. Due to these merits, HGF obtains state-of-the-art results in terms of accuracy and efficiency in the computation intensive multi-label
Abstract:Computational complexity of the brute-force implementation of the bilateral filter (BF) depends on its filter kernel size. To achieve the constant-time BF whose complexity is irrelevant to the kernel size, many techniques have been proposed, such as 2D box filtering, dimension promotion, and shiftability property. Although each of the above techniques suffers from accuracy and efficiency problems, previous algorithm designers were used to take only one of them to assemble fast implementations due to the hardness of combining them together. Hence, no joint exploitation of these techniques has been proposed to construct a new cutting edge implementation that solves these problems. Jointly employing five techniques: kernel truncation, best N -term approximation as well as previous 2D box filtering, dimension promotion, and shiftability property, we propose a unified framework to transform BF with arbitrary spatial and range kernels into a set of 3D box filters that can be computed in linear time. To the best of our knowledge, our algorithm is the first method that can integrate all these acceleration techniques and, therefore, can draw upon one another's strong point to overcome deficiencies. The strength of our method has been corroborated by several carefully designed experiments. In particular, the filtering accuracy is significantly improved without sacrificing the efficiency at running time.