Abstract:Semi-supervised semantic segmentation relieves the reliance on large-scale labeled data by leveraging unlabeled data. Recent semi-supervised semantic segmentation approaches mainly resort to pseudo-labeling methods to exploit unlabeled data. However, unreliable pseudo-labeling can undermine the semi-supervision processes. In this paper, we propose an algorithm called Multi-Level Label Correction (MLLC), which aims to use graph neural networks to capture structural relationships in Semantic-Level Graphs (SLGs) and Class-Level Graphs (CLGs) to rectify erroneous pseudo-labels. Specifically, SLGs represent semantic affinities between pairs of pixel features, and CLGs describe classification consistencies between pairs of pixel labels. With the support of proximate pattern information from graphs, MLLC can rectify incorrectly predicted pseudo-labels and can facilitate discriminative feature representations. We design an end-to-end network to train and perform this effective label corrections mechanism. Experiments demonstrate that MLLC can significantly improve supervised baselines and outperforms state-of-the-art approaches in different scenarios on Cityscapes and PASCAL VOC 2012 datasets. Specifically, MLLC improves the supervised baseline by at least 5% and 2% with DeepLabV2 and DeepLabV3+ respectively under different partition protocols.
Abstract:Semi-supervised learning frameworks usually adopt mutual learning approaches with multiple submodels to learn from different perspectives. To avoid transferring erroneous pseudo labels between these submodels, a high threshold is usually used to filter out a large number of low-confidence predictions for unlabeled data. However, such filtering can not fully exploit unlabeled data with low prediction confidence. To overcome this problem, in this work, we propose a mutual learning framework based on pseudo-negative labels. Negative labels are those that a corresponding data item does not belong. In each iteration, one submodel generates pseudo-negative labels for each data item, and the other submodel learns from these labels. The role of the two submodels exchanges after each iteration until convergence. By reducing the prediction probability on pseudo-negative labels, the dual model can improve its prediction ability. We also propose a mechanism to select a few pseudo-negative labels to feed into submodels. In the experiments, our framework achieves state-of-the-art results on several main benchmarks. Specifically, with our framework, the error rates of the 13-layer CNN model are 9.35% and 7.94% for CIFAR-10 with 1000 and 4000 labels, respectively. In addition, for the non-augmented MNIST with only 20 labels, the error rate is 0.81% by our framework, which is much smaller than that of other approaches. Our approach also demonstrates a significant performance improvement in domain adaptation.
Abstract:Convolutional neural networks can achieve remarkable performance in semantic segmentation tasks. However, such neural network approaches heavily rely on costly pixel-level annotation. Semi-supervised learning is a promising resolution to tackle this issue, but its performance still far falls behind the fully supervised counterpart. This work proposes a cross-teacher training framework with three modules that significantly improves traditional semi-supervised learning approaches. The core is a cross-teacher module, which could simultaneously reduce the coupling among peer networks and the error accumulation between teacher and student networks. In addition, we propose two complementary contrastive learning modules. The high-level module can transfer high-quality knowledge from labeled data to unlabeled ones and promote separation between classes in feature space. The low-level module can encourage low-quality features learning from the high-quality features among peer networks. In experiments, the cross-teacher module significantly improves the performance of traditional student-teacher approaches, and our framework outperforms stateof-the-art methods on benchmark datasets. Our source code of CTT will be released.
Abstract:PSNR and SSIM are the most widely used metrics in super-resolution problems, because they are easy to use and can evaluate the similarities between generated images and reference images. However, single image super-resolution is an ill-posed problem, there are multiple corresponding high-resolution images for the same low-resolution image. The similarities can't totally reflect the restoration effect. The perceptual quality of generated images is also important, but PSNR and SSIM do not reflect perceptual quality well. To solve the problem, we proposed a method called regional differential information entropy to measure both of the similarities and perceptual quality. To overcome the problem that traditional image information entropy can't reflect the structure information, we proposed to measure every region's information entropy with sliding window. Considering that the human visual system is more sensitive to the brightness difference at low brightness, we take $\gamma$ quantization rather than linear quantization. To accelerate the method, we reorganized the calculation procedure of information entropy with a neural network. Through experiments on our IQA dataset and PIPAL, this paper proves that RDIE can better quantify perceptual quality of images especially GAN-based images.