Abstract:The spatial attention mechanism captures long-range dependencies by aggregating global contextual information to each query location, which is beneficial for semantic segmentation. In this paper, we present a sparse spatial attention network (SSANet) to improve the efficiency of the spatial attention mechanism without sacrificing the performance. Specifically, a sparse non-local (SNL) block is proposed to sample a subset of key and value elements for each query element to capture long-range relations adaptively and generate a sparse affinity matrix to aggregate contextual information efficiently. Experimental results show that the proposed approach outperforms other context aggregation methods and achieves state-of-the-art performance on the Cityscapes, PASCAL Context and ADE20K datasets.
Abstract:Graph convolutional networks (GCNs) have achieved great success in dealing with data of non-Euclidean structures. Their success directly attributes to fitting graph structures effectively to data such as in social media and knowledge databases. For image processing applications, the use of graph structures and GCNs have not been fully explored. In this paper, we propose a novel encoder-decoder network with added graph convolutions by converting feature maps to vertexes of a pre-generated graph to synthetically construct graph-structured data. By doing this, we inexplicitly apply graph Laplacian regularization to the feature maps, making them more structured. The experiments show that it significantly boosts performance for image restoration tasks, including deblurring and super-resolution. We believe it opens up opportunities for GCN-based approaches in more applications.
Abstract:In recent years, Unmanned Surface Vehicles (USV) have been extensively deployed for maritime applications. However, USV has a limited detection range with sensor installed at the same elevation with the targets. In this research, we propose a cooperative Unmanned Aerial Vehicle - Unmanned Surface Vehicle (UAV-USV) platform to improve the detection range of USV. A floatable and waterproof UAV is designed and 3D printed, which allows it to land on the sea. A catamaran USV and landing platform are also developed. To land UAV on the USV precisely in various lighting conditions, IR beacon detector and IR beacon are implemented on the UAV and USV, respectively. Finally, a two-phase UAV precise landing method, USV control algorithm and USV path following algorithm are proposed and tested.
Abstract:This paper presents the development of a control system for vision-guided pick-and-place tasks using a robot arm equipped with a 3D camera. The main steps include camera intrinsic and extrinsic calibration, hand-eye calibration, initial object pose registration, objects pose alignment algorithm, and pick-and-place execution. The proposed system allows the robot be able to to pick and place object with limited times of registering a new object and the developed software can be applied for new object scenario quickly. The integrated system was tested using the hardware combination of kuka iiwa, Robotiq grippers (two finger gripper and three finger gripper) and 3D cameras (Intel realsense D415 camera, Intel realsense D435 camera, Microsoft Kinect V2). The whole system can also be modified for the combination of other robotic arm, gripper and 3D camera.
Abstract:This paper presents a sensor-level mapless collision avoidance algorithm for use in mobile robots that map raw sensor data to linear and angular velocities and navigate in an unknown environment without a map. An efficient training strategy is proposed to allow a robot to learn from both human experience data and self-exploratory data. A game format simulation framework is designed to allow the human player to tele-operate the mobile robot to a goal and human action is also scored using the reward function. Both human player data and self-playing data are sampled using prioritized experience replay algorithm. The proposed algorithm and training strategy have been evaluated in two different experimental configurations: \textit{Environment 1}, a simulated cluttered environment, and \textit{Environment 2}, a simulated corridor environment, to investigate the performance. It was demonstrated that the proposed method achieved the same level of reward using only 16\% of the training steps required by the standard Deep Deterministic Policy Gradient (DDPG) method in Environment 1 and 20\% of that in Environment 2. In the evaluation of 20 random missions, the proposed method achieved no collision in less than 2~h and 2.5~h of training time in the two Gazebo environments respectively. The method also generated smoother trajectories than DDPG. The proposed method has also been implemented on a real robot in the real-world environment for performance evaluation. We can confirm that the trained model with the simulation software can be directly applied into the real-world scenario without further fine-tuning, further demonstrating its higher robustness than DDPG. The video and code are available: https://youtu.be/BmwxevgsdGc https://github.com/hanlinniu/turtlebot3_ddpg_collision_avoidance
Abstract:Timely disaster risk management requires accurate road maps and prompt damage assessment. Currently, this is done by volunteers manually marking satellite imagery of affected areas but this process is slow and often error-prone. Segmentation algorithms can be applied to satellite images to detect road networks. However, existing methods are unsuitable for disaster-struck areas as they make assumptions about the road network topology which may no longer be valid in these scenarios. Herein, we propose a CNN-based framework for identifying accessible roads in post-disaster imagery by detecting changes from pre-disaster imagery. Graph theory is combined with the CNN output for detecting semantic changes in road networks with OpenStreetMap data. Our results are validated with data of a tsunami-affected region in Palu, Indonesia acquired from DigitalGlobe.
Abstract:Satellite images are an extremely valuable resource in the aftermath of natural disasters such as hurricanes and tsunamis where they can be used for risk assessment and disaster management. In order to provide timely and actionable information for disaster response, in this paper a framework utilising segmentation neural networks is proposed to identify impacted areas and accessible roads in post-disaster scenarios. The effectiveness of pretraining with ImageNet on the task of aerial image segmentation has been analysed and performances of popular segmentation models compared. Experimental results show that pretraining on ImageNet usually improves the segmentation performance for a number of models. Open data available from OpenStreetMap (OSM) is used for training, forgoing the need for time-consuming manual annotation. The method also makes use of graph theory to update road network data available from OSM and to detect the changes caused by a natural disaster. Extensive experiments on data from the 2018 tsunami that struck Palu, Indonesia show the effectiveness of the proposed framework. ENetSeparable, with 30% fewer parameters compared to ENet, achieved comparable segmentation results to that of the state-of-the-art networks.
Abstract:LiDAR provides highly accurate 3D point clouds. However, data needs to be manually labelled in order to provide subsequent useful information. Manual annotation of such data is time consuming, tedious and error prone, and hence in this paper we present three automatic methods for annotating trees in LiDAR data. The first method requires high density point clouds and uses certain LiDAR data attributes for the purpose of tree identification, achieving almost 90% accuracy. The second method uses a voxel-based 3D Convolutional Neural Network on low density LiDAR datasets and is able to identify most large trees accurately but struggles with smaller ones due to the voxelisation process. The third method is a scaled version of the PointNet++ method and works directly on outdoor point clouds and achieves an F_score of 82.1% on the ISPRS benchmark dataset, comparable to the state-of-the-art methods but with increased efficiency.
Abstract:Explainability is an important factor to drive user trust in the use of neural networks for tasks with material impact. However, most of the work done in this area focuses on image analysis and does not take into account 3D data. We extend the saliency methods that have been shown to work on image data to deal with 3D data. We analyse the features in point clouds and voxel spaces and show that edges and corners in 3D data are deemed as important features while planar surfaces are deemed less important. The approach is model-agnostic and can provide useful information about learnt features. Driven by the insight that 3D data is inherently sparse, we visualise the features learnt by a voxel-based classification network and show that these features are also sparse and can be pruned relatively easily, leading to more efficient neural networks. Our results show that the Voxception-ResNet model can be pruned down to 5\% of its parameters with negligible loss in accuracy.
Abstract:A semantic feature extraction method for multitemporal high resolution aerial image registration is proposed in this paper. These features encode properties or information about temporally invariant objects such as roads and help deal with issues such as changing foliage in image registration, which classical handcrafted features are unable to address. These features are extracted from a semantic segmentation network and have shown good robustness and accuracy in registering aerial images across years and seasons in the experiments.