Abstract:Purpose: Common dense stereo Simultaneous Localization and Mapping (SLAM) approaches in Minimally Invasive Surgery (MIS) require high-end parallel computational resources for real-time implementation. Yet, it is not always feasible since the computational resources should be allocated to other tasks like segmentation, detection, and tracking. To solve the problem of limited parallel computational power, this research aims at a lightweight dense stereo SLAM system that works on a single-core CPU and achieves real-time performance (more than 30 Hz in typical scenarios). Methods: A new dense stereo mapping module is integrated with the ORB-SLAM2 system and named BDIS-SLAM. Our new dense stereo mapping module includes stereo matching and 3D dense depth mosaic methods. Stereo matching is achieved with the recently proposed CPU-level real-time matching algorithm Bayesian Dense Inverse Searching (BDIS). A BDIS-based shape recovery and a depth mosaic strategy are integrated as a new thread and coupled with the backbone ORB-SLAM2 system for real-time stereo shape recovery. Results: Experiments on in-vivo data sets show that BDIS-SLAM runs at over 30 Hz speed on modern single-core CPU in typical endoscopy/colonoscopy scenarios. BDIS-SLAM only consumes around an additional 12% time compared with the backbone ORB-SLAM2. Although our lightweight BDIS-SLAM simplifies the process by ignoring deformation and fusion procedures, it can provide a usable dense mapping for modern MIS on computationally constrained devices. Conclusion: The proposed BDIS-SLAM is a lightweight stereo dense SLAM system for MIS. It achieves 30 Hz on a modern single-core CPU in typical endoscopy/colonoscopy scenarios (image size around 640*480). BDIS-SLAM provides a low-cost solution for dense mapping in MIS and has the potential to be applied in surgical robots and AR systems.
Abstract:In stereoscope-based Minimally Invasive Surgeries (MIS), dense stereo matching plays an indispensable role in 3D shape recovery, AR, VR, and navigation tasks. Although numerous Deep Neural Network (DNN) approaches are proposed, the conventional prior-free approaches are still popular in the industry because of the lack of open-source annotated data set and the limitation of the task-specific pre-trained DNNs. Among the prior-free stereo matching algorithms, there is no successful real-time algorithm in none GPU environment for MIS. This paper proposes the first CPU-level real-time prior-free stereo matching algorithm for general MIS tasks. We achieve an average 17 Hz on 640*480 images with a single-core CPU (i5-9400) for surgical images. Meanwhile, it achieves slightly better accuracy than the popular ELAS. The patch-based fast disparity searching algorithm is adopted for the rectified stereo images. A coarse-to-fine Bayesian probability and a spatial Gaussian mixed model were proposed to evaluate the patch probability at different scales. An optional probability density function estimation algorithm was adopted to quantify the prediction variance. Extensive experiments demonstrated the proposed method's capability to handle ambiguities introduced by the textureless surfaces and the photometric inconsistency from the non-Lambertian reflectance and dark illumination. The estimated probability managed to balance the confidences of the patches for stereo images at different scales. It has similar or higher accuracy and fewer outliers than the baseline ELAS in MIS, while it is 4-5 times faster. The code and the synthetic data sets are available at https://github.com/JingweiSong/BDIS-v2.
Abstract:This paper reports a CPU-level real-time stereo matching method for surgical images (10 Hz on 640 * 480 image with a single core of i5-9400). The proposed method is built on the fast ''dense inverse searching'' algorithm, which estimates the disparity of the stereo images. The overlapping image patches (arbitrary squared image segment) from the images at different scales are aligned based on the photometric consistency presumption. We propose a Bayesian framework to evaluate the probability of the optimized patch disparity at different scales. Moreover, we introduce a spatial Gaussian mixed probability distribution to address the pixel-wise probability within the patch. In-vivo and synthetic experiments show that our method can handle ambiguities resulted from the textureless surfaces and the photometric inconsistency caused by the Lambertian reflectance. Our Bayesian method correctly balances the probability of the patch for stereo images at different scales. Experiments indicate that the estimated depth has higher accuracy and fewer outliers than the baseline methods in the surgical scenario.
Abstract:Drone imagery is increasingly used in automated inspection for infrastructure surface defects, especially in hazardous or unreachable environments. In machine vision, the key to crack detection rests with robust and accurate algorithms for image processing. To this end, this paper proposes a deep learning approach using hierarchical convolutional neural networks with feature preservation (HCNNFP) and an intercontrast iterative thresholding algorithm for image binarization. First, a set of branch networks is proposed, wherein the output of previous convolutional blocks is half-sizedly concatenated to the current ones to reduce the obscuration in the down-sampling stage taking into account the overall information loss. Next, to extract the feature map generated from the enhanced HCNN, a binary contrast-based autotuned thresholding (CBAT) approach is developed at the post-processing step, where patterns of interest are clustered within the probability map of the identified features. The proposed technique is then applied to identify surface cracks on the surface of roads, bridges or pavements. An extensive comparison with existing techniques is conducted on various datasets and subject to a number of evaluation criteria including the average F-measure (AF\b{eta}) introduced here for dynamic quantification of the performance. Experiments on crack images, including those captured by unmanned aerial vehicles inspecting a monorail bridge. The proposed technique outperforms the existing methods on various tested datasets especially for GAPs dataset with an increase of about 1.4% in terms of AF\b{eta} while the mean percentage error drops by 2.2%. Such performance demonstrates the merits of the proposed HCNNFP architecture for surface defect inspection.
Abstract:Unmanned aerial vehicles (UAV) are expected to replace human in hazardous tasks of surface inspection due to their flexibility in operating space and capability of collecting high quality visual data. In this study, we propose enhanced hierarchical convolutional neural networks (HCNN) to detect cracks from image data collected by UAVs. Unlike traditional HCNN, here a set of branch networks is utilised to reduce the obscuration in the down-sampling process. Moreover, the feature preserving blocks combine the current and previous terms from the convolutional blocks to provide input to the loss functions. As a result, the weights of resized images can be reduced to minimise the information loss. Experiments on images of different crack datasets have been carried out to demonstrate the effectiveness of proposed HCNN.