Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chiew-Lan Tai

LiDAL: Inter-frame Uncertainty Based Active Learning for 3D LiDAR Semantic Segmentation

Nov 11, 2022

Zeyu Hu, Xuyang Bai, Runze Zhang, Xin Wang, Guangyuan Sun, Hongbo Fu, Chiew-Lan Tai

Abstract:We propose LiDAL, a novel active learning method for 3D LiDAR semantic segmentation by exploiting inter-frame uncertainty among LiDAR frames. Our core idea is that a well-trained model should generate robust results irrespective of viewpoints for scene scanning and thus the inconsistencies in model predictions across frames provide a very reliable measure of uncertainty for active sample selection. To implement this uncertainty measure, we introduce new inter-frame divergence and entropy formulations, which serve as the metrics for active selection. Moreover, we demonstrate additional performance gains by predicting and incorporating pseudo-labels, which are also selected using the proposed inter-frame uncertainty measure. Experimental results validate the effectiveness of LiDAL: we achieve 95% of the performance of fully supervised learning with less than 5% of annotations on the SemanticKITTI and nuScenes datasets, outperforming state-of-the-art active learning methods. Code release: https://github.com/hzykent/LiDAL.

* ECCV 2022, supplementary materials included

Via

Access Paper or Ask Questions

TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers

Mar 22, 2022

Xuyang Bai, Zeyu Hu, Xinge Zhu, Qingqiu Huang, Yilun Chen, Hongbo Fu, Chiew-Lan Tai

Figure 1 for TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers

Figure 2 for TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers

Figure 3 for TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers

Figure 4 for TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers

Abstract:LiDAR and camera are two important sensors for 3D object detection in autonomous driving. Despite the increasing popularity of sensor fusion in this field, the robustness against inferior image conditions, e.g., bad illumination and sensor misalignment, is under-explored. Existing fusion methods are easily affected by such conditions, mainly due to a hard association of LiDAR points and image pixels, established by calibration matrices. We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions. Specifically, our TransFusion consists of convolutional backbones and a detection head based on a transformer decoder. The first layer of the decoder predicts initial bounding boxes from a LiDAR point cloud using a sparse set of object queries, and its second decoder layer adaptively fuses the object queries with useful image features, leveraging both spatial and contextual relationships. The attention mechanism of the transformer enables our model to adaptively determine where and what information should be taken from the image, leading to a robust and effective fusion strategy. We additionally design an image-guided query initialization strategy to deal with objects that are difficult to detect in point clouds. TransFusion achieves state-of-the-art performance on large-scale datasets. We provide extensive experiments to demonstrate its robustness against degenerated image quality and calibration errors. We also extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking, showing its effectiveness and generalization capability.

* Accepted to CVPR2022; Code at \url{https://github.com/XuyangBai/TransFusion}; Based on this work, we achieve the 1st place in the leaderboard of nuScenes tracking

Via

Access Paper or Ask Questions

Learning to Match Features with Seeded Graph Matching Network

Aug 19, 2021

Hongkai Chen, Zixin Luo, Jiahui Zhang, Lei Zhou, Xuyang Bai, Zeyu Hu, Chiew-Lan Tai, Long Quan

Figure 1 for Learning to Match Features with Seeded Graph Matching Network

Figure 2 for Learning to Match Features with Seeded Graph Matching Network

Figure 3 for Learning to Match Features with Seeded Graph Matching Network

Figure 4 for Learning to Match Features with Seeded Graph Matching Network

Abstract:Matching local features across images is a fundamental problem in computer vision. Targeting towards high accuracy and efficiency, we propose Seeded Graph Matching Network, a graph neural network with sparse structure to reduce redundant connectivity and learn compact representation. The network consists of 1) Seeding Module, which initializes the matching by generating a small set of reliable matches as seeds. 2) Seeded Graph Neural Network, which utilizes seed matches to pass messages within/across images and predicts assignment costs. Three novel operations are proposed as basic elements for message passing: 1) Attentional Pooling, which aggregates keypoint features within the image to seed matches. 2) Seed Filtering, which enhances seed features and exchanges messages across images. 3) Attentional Unpooling, which propagates seed features back to original keypoints. Experiments show that our method reduces computational and memory complexity significantly compared with typical attention-based networks while competitive or higher performance is achieved.

* Accepted by ICCV2021, code to be realeased at https://github.com/vdvchen/SGMNet

Via

Access Paper or Ask Questions

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation

Jul 29, 2021

Zeyu Hu, Xuyang Bai, Jiaxiang Shang, Runze Zhang, Jiayu Dong, Xin Wang, Guangyuan Sun, Hongbo Fu, Chiew-Lan Tai

Figure 1 for VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation

Figure 2 for VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation

Figure 3 for VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation

Figure 4 for VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation

Abstract:In recent years, sparse voxel-based methods have become the state-of-the-arts for 3D semantic segmentation of indoor scenes, thanks to the powerful 3D CNNs. Nevertheless, being oblivious to the underlying geometry, voxel-based methods suffer from ambiguous features on spatially close objects and struggle with handling complex and irregular geometries due to the lack of geodesic information. In view of this, we present Voxel-Mesh Network (VMNet), a novel 3D deep architecture that operates on the voxel and mesh representations leveraging both the Euclidean and geodesic information. Intuitively, the Euclidean information extracted from voxels can offer contextual cues representing interactions between nearby objects, while the geodesic information extracted from meshes can help separate objects that are spatially close but have disconnected surfaces. To incorporate such information from the two domains, we design an intra-domain attentive module for effective feature aggregation and an inter-domain attentive module for adaptive feature fusion. Experimental results validate the effectiveness of VMNet: specifically, on the challenging ScanNet dataset for large-scale segmentation of indoor scenes, it outperforms the state-of-the-art SparseConvNet and MinkowskiNet (74.6% vs 72.5% and 73.6% in mIoU) with a simpler network structure (17M vs 30M and 38M parameters). Code release: https://github.com/hzykent/VMNet

* ICCV2021(Oral), supplementary materials included

Via

Access Paper or Ask Questions

PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Mar 09, 2021

Xuyang Bai, Zixin Luo, Lei Zhou, Hongkai Chen, Lei Li, Zeyu Hu, Hongbo Fu, Chiew-Lan Tai

Figure 1 for PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Figure 2 for PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Figure 3 for PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Figure 4 for PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Abstract:Removing outlier correspondences is one of the critical steps for successful feature-based point cloud registration. Despite the increasing popularity of introducing deep learning methods in this field, spatial consistency, which is essentially established by a Euclidean transformation between point clouds, has received almost no individual attention in existing learning frameworks. In this paper, we present PointDSC, a novel deep neural network that explicitly incorporates spatial consistency for pruning outlier correspondences. First, we propose a nonlocal feature aggregation module, weighted by both feature and spatial coherence, for feature embedding of the input correspondences. Second, we formulate a differentiable spectral matching module, supervised by pairwise spatial compatibility, to estimate the inlier confidence of each correspondence from the embedded features. With modest computation cost, our method outperforms the state-of-the-art hand-crafted and learning-based outlier rejection approaches on several real-world datasets by a significant margin. We also show its wide applicability by combining PointDSC with different 3D local descriptors.

* Accepted to CVPR 2021, supplementary materials included

Via

Access Paper or Ask Questions

End-to-End Learning Local Multi-view Descriptors for 3D Point Clouds

Mar 16, 2020

Lei Li, Siyu Zhu, Hongbo Fu, Ping Tan, Chiew-Lan Tai

Figure 1 for End-to-End Learning Local Multi-view Descriptors for 3D Point Clouds

Figure 2 for End-to-End Learning Local Multi-view Descriptors for 3D Point Clouds

Figure 3 for End-to-End Learning Local Multi-view Descriptors for 3D Point Clouds

Figure 4 for End-to-End Learning Local Multi-view Descriptors for 3D Point Clouds

Abstract:In this work, we propose an end-to-end framework to learn local multi-view descriptors for 3D point clouds. To adopt a similar multi-view representation, existing studies use hand-crafted viewpoints for rendering in a preprocessing stage, which is detached from the subsequent descriptor learning stage. In our framework, we integrate the multi-view rendering into neural networks by using a differentiable renderer, which allows the viewpoints to be optimizable parameters for capturing more informative local context of interest points. To obtain discriminative descriptors, we also design a soft-view pooling module to attentively fuse convolutional features across views. Extensive experiments on existing 3D registration benchmarks show that our method outperforms existing local descriptors both quantitatively and qualitatively.

* CVPR 2020. Webpage: https://github.com/craigleili/3DLocalMultiViewDesc

Via

Access Paper or Ask Questions

D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

Mar 06, 2020

Xuyang Bai, Zixin Luo, Lei Zhou, Hongbo Fu, Long Quan, Chiew-Lan Tai

Figure 1 for D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

Figure 2 for D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

Figure 3 for D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

Figure 4 for D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features

Abstract:A successful point cloud registration often lies on robust establishment of sparse matches through discriminative 3D local features. Despite the fast evolution of learning-based 3D feature descriptors, little attention has been drawn to the learning of 3D feature detectors, even less for a joint learning of the two tasks. In this paper, we leverage a 3D fully convolutional network for 3D point clouds, and propose a novel and practical learning mechanism that densely predicts both a detection score and a description feature for each 3D point. In particular, we propose a keypoint selection strategy that overcomes the inherent density variations of 3D point clouds, and further propose a self-supervised detector loss guided by the on-the-fly feature matching results during training. Finally, our method achieves state-of-the-art results in both indoor and outdoor scenarios, evaluated on 3DMatch and KITTI datasets, and shows its strong generalization ability on the ETH dataset. Towards practical use, we show that by adopting a reliable feature detector, sampling a smaller number of features is sufficient to achieve accurate and fast point cloud alignment.[code release](https://github.com/XuyangBai/D3Feat)

* Accepted to CVPR 2020, supplementary materials included

Via

Access Paper or Ask Questions

SketchDesc: Learning Local Sketch Descriptors for Multi-view Correspondence

Jan 17, 2020

Deng Yu, Lei Li, Youyi Zheng, Manfred Lau, Yi-Zhe Song, Chiew-Lan Tai, Hongbo Fu

Figure 1 for SketchDesc: Learning Local Sketch Descriptors for Multi-view Correspondence

Figure 2 for SketchDesc: Learning Local Sketch Descriptors for Multi-view Correspondence

Figure 3 for SketchDesc: Learning Local Sketch Descriptors for Multi-view Correspondence

Figure 4 for SketchDesc: Learning Local Sketch Descriptors for Multi-view Correspondence

Abstract:In this paper, we study the problem of multi-view sketch correspondence, where we take as input multiple freehand sketches with different views of the same object and predict semantic correspondence among the sketches. This problem is challenging, since visual features of corresponding points at different views can be very different. To this end, we take a deep learning approach and learn a novel local sketch descriptor from data. We contribute a training dataset by generating the pixel-level correspondence for the multi-view line drawings synthesized from 3D shapes. To handle the sparsity and ambiguity of sketches, we design a novel multi-branch neural network that integrates a patch-based representation and a multi-scale strategy to learn the \pixelLevel correspondence among multi-view sketches. We demonstrate the effectiveness of our proposed approach with extensive experiments on hand-drawn sketches, and multi-view line drawings rendered from multiple 3D shape datasets.

Via

Access Paper or Ask Questions

Sketch-R2CNN: An Attentive Network for Vector Sketch Recognition

Nov 20, 2018

Lei Li, Changqing Zou, Youyi Zheng, Qingkun Su, Hongbo Fu, Chiew-Lan Tai

Figure 1 for Sketch-R2CNN: An Attentive Network for Vector Sketch Recognition

Figure 2 for Sketch-R2CNN: An Attentive Network for Vector Sketch Recognition

Figure 3 for Sketch-R2CNN: An Attentive Network for Vector Sketch Recognition

Figure 4 for Sketch-R2CNN: An Attentive Network for Vector Sketch Recognition

Abstract:Freehand sketching is a dynamic process where points are sequentially sampled and grouped as strokes for sketch acquisition on electronic devices. To recognize a sketched object, most existing methods discard such important temporal ordering and grouping information from human and simply rasterize sketches into binary images for classification. In this paper, we propose a novel single-branch attentive network architecture RNN-Rasterization-CNN (Sketch-R2CNN for short) to fully leverage the dynamics in sketches for recognition. Sketch-R2CNN takes as input only a vector sketch with grouped sequences of points, and uses an RNN for stroke attention estimation in the vector space and a CNN for 2D feature extraction in the pixel space respectively. To bridge the gap between these two spaces in neural networks, we propose a neural line rasterization module to convert the vector sketch along with the attention estimated by RNN into a bitmap image, which is subsequently consumed by CNN. The neural line rasterization module is designed in a differentiable way to yield a unified pipeline for end-to-end learning. We perform experiments on existing large-scale sketch recognition benchmarks and show that by exploiting the sketch dynamics with the attention mechanism, our method is more robust and achieves better performance than the state-of-the-art methods.

Via

Access Paper or Ask Questions