Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jongbeom Baek

Semi-Supervised Learning of Monocular Depth Estimation via Consistency Regularization with K-way Disjoint Masking

Dec 22, 2022

Jongbeom Baek, Gyeongnyeon Kim, Seonghoon Park, Honggyu An, Matteo Poggi, Seungryong Kim

Abstract:Semi-Supervised Learning (SSL) has recently accomplished successful achievements in various fields such as image classification, object detection, and semantic segmentation, which typically require a lot of labour to construct ground-truth. Especially in the depth estimation task, annotating training data is very costly and time-consuming, and thus recent SSL regime seems an attractive solution. In this paper, for the first time, we introduce a novel framework for semi-supervised learning of monocular depth estimation networks, using consistency regularization to mitigate the reliance on large ground-truth depth data. We propose a novel data augmentation approach, called K-way disjoint masking, which allows the network for learning how to reconstruct invisible regions so that the model not only becomes robust to perturbations but also generates globally consistent output depth maps. Experiments on the KITTI and NYU-Depth-v2 datasets demonstrate the effectiveness of each component in our pipeline, robustness to the use of fewer and fewer annotated images, and superior results compared to other state-of-the-art, semi-supervised methods for monocular depth estimation. Our code is available at https://github.com/KU-CVLAB/MaskingDepth.

* Project page: https://github.com/KU-CVLAB/MaskingDepth

Via

Access Paper or Ask Questions

InstaFormer: Instance-Aware Image-to-Image Translation with Transformer

Mar 30, 2022

Soohyun Kim, Jongbeom Baek, Jihye Park, Gyeongnyeon Kim, Seungryong Kim

Figure 1 for InstaFormer: Instance-Aware Image-to-Image Translation with Transformer

Figure 2 for InstaFormer: Instance-Aware Image-to-Image Translation with Transformer

Figure 3 for InstaFormer: Instance-Aware Image-to-Image Translation with Transformer

Figure 4 for InstaFormer: Instance-Aware Image-to-Image Translation with Transformer

Abstract:We present a novel Transformer-based network architecture for instance-aware image-to-image translation, dubbed InstaFormer, to effectively integrate global- and instance-level information. By considering extracted content features from an image as tokens, our networks discover global consensus of content features by considering context information through a self-attention module in Transformers. By augmenting such tokens with an instance-level feature extracted from the content feature with respect to bounding box information, our framework is capable of learning an interaction between object instances and the global image, thus boosting the instance-awareness. We replace layer normalization (LayerNorm) in standard Transformers with adaptive instance normalization (AdaIN) to enable a multi-modal translation with style codes. In addition, to improve the instance-awareness and translation quality at object regions, we present an instance-level content contrastive loss defined between input and translated image. We conduct experiments to demonstrate the effectiveness of our InstaFormer over the latest methods and provide extensive ablation studies.

* Accepted to CVPR 2022

Via

Access Paper or Ask Questions

Semi-Supervised Learning with Mutual Distillation for Monocular Depth Estimation

Mar 18, 2022

Jongbeom Baek, Gyeongnyeon Kim, Seungryong Kim

Figure 1 for Semi-Supervised Learning with Mutual Distillation for Monocular Depth Estimation

Figure 2 for Semi-Supervised Learning with Mutual Distillation for Monocular Depth Estimation

Figure 3 for Semi-Supervised Learning with Mutual Distillation for Monocular Depth Estimation

Figure 4 for Semi-Supervised Learning with Mutual Distillation for Monocular Depth Estimation

Abstract:We propose a semi-supervised learning framework for monocular depth estimation. Compared to existing semi-supervised learning methods, which inherit limitations of both sparse supervised and unsupervised loss functions, we achieve the complementary advantages of both loss functions, by building two separate network branches for each loss and distilling each other through the mutual distillation loss function. We also present to apply different data augmentation to each branch, which improves the robustness. We conduct experiments to demonstrate the effectiveness of our framework over the latest methods and provide extensive ablation studies.

* IEEE Conference on Robotics and Automation (ICRA) 2022

Via

Access Paper or Ask Questions