Abstract:Ultrasound (US)-guided needle insertion is widely employed in percutaneous interventions. However, providing feedback on the needle tip position via US image presents challenges due to noise, artifacts, and the thin imaging plane of US, which degrades needle features and leads to intermittent tip visibility. In this paper, a Mamba-based US needle tracker MambaXCTrack utilizing structured state space models cross-correlation (SSMX-Corr) and implicit motion prompt is proposed, which is the first application of Mamba in US needle tracking. The SSMX-Corr enhances cross-correlation by long-range modeling and global searching of distant semantic features between template and search maps, benefiting the tracking under noise and artifacts by implicitly learning potential distant semantic cues. By combining with cross-map interleaved scan (CIS), local pixel-wise interaction with positional inductive bias can also be introduced to SSMX-Corr. The implicit low-level motion descriptor is proposed as a non-visual prompt to enhance tracking robustness, addressing the intermittent tip visibility problem. Extensive experiments on a dataset with motorized needle insertion in both phantom and tissue samples demonstrate that the proposed tracker outperforms other state-of-the-art trackers while ablation studies further highlight the effectiveness of each proposed tracking module.
Abstract:Gastric simulators with objective educational feedback have been proven useful for endoscopy training. Existing electronic simulators with feedback are however not commonly adopted due to their high cost. In this work, a motion-guided dual-camera tracker is proposed to provide reliable endoscope tip position feedback at a low cost inside a mechanical simulator for endoscopy skill evaluation, tackling several unique challenges. To address the issue of significant appearance variation of the endoscope tip while keeping dual-camera tracking consistency, the cross-camera mutual template strategy (CMT) is proposed to introduce dynamic transient mutual templates to dual-camera tracking. To alleviate disturbance from large occlusion and distortion by the light source from the endoscope tip, the Mamba-based motion-guided prediction head (MMH) is presented to aggregate visual tracking with historical motion information modeled by the state space model. The proposed tracker was evaluated on datasets captured by low-cost camera pairs during endoscopy procedures performed inside the mechanical simulator. The tracker achieves SOTA performance with robust and consistent tracking on dual cameras. Further downstream evaluation proves that the 3D tip position determined by the proposed tracker enables reliable skill differentiation. The code and dataset will be released upon acceptance.