Abstract:Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images capture high-resolution views of the retina with typically 200 spanning degrees. Accurate segmentation of vessels in UWF-SLO images is essential for detecting and diagnosing fundus disease. Recent studies have revealed that the selective State Space Model (SSM) in Mamba performs well in modeling long-range dependencies, which is crucial for capturing the continuity of elongated vessel structures. Inspired by this, we propose the first Serpentine Mamba (Serp-Mamba) network to address this challenging task. Specifically, we recognize the intricate, varied, and delicate nature of the tubular structure of vessels. Furthermore, the high-resolution of UWF-SLO images exacerbates the imbalance between the vessel and background categories. Based on the above observations, we first devise a Serpentine Interwoven Adaptive (SIA) scan mechanism, which scans UWF-SLO images along curved vessel structures in a snake-like crawling manner. This approach, consistent with vascular texture transformations, ensures the effective and continuous capture of curved vascular structure features. Second, we propose an Ambiguity-Driven Dual Recalibration (ADDR) module to address the category imbalance problem intensified by high-resolution images. Our ADDR module delineates pixels by two learnable thresholds and refines ambiguous pixels through a dual-driven strategy, thereby accurately distinguishing vessels and background regions. Experiment results on three datasets demonstrate the superior performance of our Serp-Mamba on high-resolution vessel segmentation. We also conduct a series of ablation studies to verify the impact of our designs. Our code shall be released upon publication of this work.
Abstract:Accurate vessel segmentation in Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images is crucial for diagnosing retinal diseases. Although recent techniques have shown encouraging outcomes in vessel segmentation, models trained on one medical dataset often underperform on others due to domain shifts. Meanwhile, manually labeling high-resolution UWF-SLO images is an extremely challenging, time-consuming and expensive task. In response, this study introduces a pioneering framework that leverages a patch-based active domain adaptation approach. By actively recommending a few valuable image patches by the devised Cascade Uncertainty-Predominance (CUP) selection strategy for labeling and model-finetuning, our method significantly improves the accuracy of UWF-SLO vessel segmentation across diverse medical centers. In addition, we annotate and construct the first Multi-center UWF-SLO Vessel Segmentation (MU-VS) dataset to promote this topic research, comprising data from multiple institutions. This dataset serves as a valuable resource for cross-center evaluation, verifying the effectiveness and robustness of our approach. Experimental results demonstrate that our approach surpasses existing domain adaptation and active learning methods, considerably reducing the gap between the Upper and Lower bounds with minimal annotations, highlighting our method's practical clinical value. We will release our dataset and code to facilitate relevant research: https://github.com/whq-xxh/SFADA-UWF-SLO.
Abstract:Graph learning has a wide range of applications in many scenarios, which require more need for data privacy. Federated learning is an emerging distributed machine learning approach that leverages data from individual devices or data centers to improve the accuracy and generalization of the model, while also protecting the privacy of user data. Graph-federated learning is mainly based on the classical federated learning framework i.e., the Client-Server framework. However, the Client-Server framework faces problems such as a single point of failure of the central server and poor scalability of network topology. First, we introduce the decentralized framework to graph-federated learning. Second, determine the confidence among nodes based on the similarity of data among nodes, subsequently, the gradient information is then aggregated by linear weighting based on confidence. Finally, the proposed method is compared with FedAvg, Fedprox, GCFL, and GCFL+ to verify the effectiveness of the proposed method. Experiments demonstrate that the proposed method outperforms other methods.
Abstract:3D reconstruction plays an increasingly important role in modern photogrammetric systems. Conventional satellite or aerial-based remote sensing (RS) platforms can provide the necessary data sources for the 3D reconstruction of large-scale landforms and cities. Even with low-altitude UAVs (Unmanned Aerial Vehicles), 3D reconstruction in complicated situations, such as urban canyons and indoor scenes, is challenging due to the frequent tracking failures between camera frames and high data collection costs. Recently, spherical images have been extensively exploited due to the capability of recording surrounding environments from one camera exposure. Classical 3D reconstruction pipelines, however, cannot be used for spherical images. Besides, there exist few software packages for 3D reconstruction of spherical images. Based on the imaging geometry of spherical cameras, this study investigates the algorithms for the relative orientation using spherical correspondences, absolute orientation using 3D correspondences between scene and spherical points, and the cost functions for BA (bundle adjustment) optimization. In addition, an incremental SfM (Structure from Motion) workflow has been proposed for spherical images using the above-mentioned algorithms. The proposed solution is finally verified by using three spherical datasets captured by both consumer-grade and professional spherical cameras. The results demonstrate that the proposed SfM workflow can achieve the successful 3D reconstruction of complex scenes and provide useful clues for the implementation in open-source software packages. The source code of the designed SfM workflow would be made publicly available.
Abstract:3D reconstruction plays an increasingly important role in modern photogrammetric systems. Conventional satellite or aerial-based remote sensing (RS) platforms can provide the necessary data sources for the 3D reconstruction of large-scale landforms and cities. Even with low-altitude UAVs (Unmanned Aerial Vehicles), 3D reconstruction in complicated situations, such as urban canyons and indoor scenes, is challenging due to frequent tracking failures between camera frames and high data collection costs. Recently, spherical images have been extensively used due to the capability of recording surrounding environments from one camera exposure. In contrast to perspective images with limited FOV (Field of View), spherical images can cover the whole scene with full horizontal and vertical FOV and facilitate camera tracking and data acquisition in these complex scenes. With the rapid evolution and extensive use of professional and consumer-grade spherical cameras, spherical images show great potential for the 3D modeling of urban and indoor scenes. Classical 3D reconstruction pipelines, however, cannot be directly used for spherical images. Besides, there exist few software packages that are designed for the 3D reconstruction of spherical images. As a result, this research provides a thorough survey of the state-of-the-art for 3D reconstruction of spherical images in terms of data acquisition, feature detection and matching, image orientation, and dense matching as well as presenting promising applications and discussing potential prospects. We anticipate that this study offers insightful clues to direct future research.
Abstract:Incremental Structure from Motion (ISfM) has been widely used for UAV image orientation. Its efficiency, however, decreases dramatically due to the sequential constraint. Although the divide-and-conquer strategy has been utilized for efficiency improvement, cluster merging becomes difficult or depends on seriously designed overlap structures. This paper proposes an algorithm to extract the global model for cluster merging and designs a parallel SfM solution to achieve efficient and accurate UAV image orientation. First, based on vocabulary tree retrieval, match pairs are selected to construct an undirected weighted match graph, whose edge weights are calculated by considering both the number and distribution of feature matches. Second, an algorithm, termed weighted connected dominating set (WCDS), is designed to achieve the simplification of the match graph and build the global model, which incorporates the edge weight in the graph node selection and enables the successful reconstruction of the global model. Third, the match graph is simultaneously divided into compact and non-overlapped clusters. After the parallel reconstruction, cluster merging is conducted with the aid of common 3D points between the global and cluster models. Finally, by using three UAV datasets that are captured by classical oblique and recent optimized views photogrammetry, the validation of the proposed solution is verified through comprehensive analysis and comparison. The experimental results demonstrate that the proposed parallel SfM can achieve 17.4 times efficiency improvement and comparative orientation accuracy. In absolute BA, the geo-referencing accuracy is approximately 2.0 and 3.0 times the GSD (Ground Sampling Distance) value in the horizontal and vertical directions, respectively. For parallel SfM, the proposed solution is a more reliable alternative.
Abstract:In this paper, we present a novel stereo visual inertial pose estimation method. Compared to the widely used filter-based or optimization-based approaches, the pose estimation process is modeled as a control system. Designed feedback or feedforward loops are introduced to achieve the stable control of the system, which include a gradient decreased feedback loop, a roll-pitch feed forward loop and a bias estimation feedback loop. This system, named FLVIS (Feedforward-feedback Loop-based Visual Inertial System), is evaluated on the popular EuRoc MAV dataset. FLVIS achieves high accuracy and robustness with respect to other state-of-the-art visual SLAM approaches. The system has also been implemented and tested on a UAV platform. The source code of this research is public to the research community.