Abstract:Vehicle-Infrastructure Collaborative Perception (VICP) is pivotal for resolving occlusion in autonomous driving, yet the trade-off between communication bandwidth and feature redundancy remains a critical bottleneck. While intermediate fusion mitigates data volume compared to raw sharing, existing frameworks typically rely on spatial compression or static confidence maps, which inefficiently transmit spatially redundant features from non-critical background regions. To address this, we propose Risk-intent Selective detection (RiSe), an interaction-aware framework that shifts the paradigm from identifying visible regions to prioritizing risk-critical ones. Specifically, we introduce a Potential Field-Trajectory Correlation Model (PTCM) grounded in potential field theory to quantitatively assess kinematic risks. Complementing this, an Intention-Driven Area Prediction Module (IDAPM) leverages ego-motion priors to proactively predict and filter key Bird's-Eye-View (BEV) areas essential for decision-making. By integrating these components, RiSe implements a semantic-selective fusion scheme that transmits high-fidelity features only from high-interaction regions, effectively acting as a feature denoiser. Extensive experiments on the DeepAccident dataset demonstrate that our method reduces communication volume to 0.71\% of full feature sharing while maintaining state-of-the-art detection accuracy, establishing a competitive Pareto frontier between bandwidth efficiency and perception performance.




Abstract:Permanent magnet tracking using the external sensor array is crucial for the accurate localization of wireless capsule endoscope robots. Traditional tracking algorithms, based on the magnetic dipole model and Levenberg-Marquardt (LM) algorithm, face challenges related to computational delays and the need for initial position estimation. More recently proposed neural network-based approaches often require extensive hardware calibration and real-world data collection, which are time-consuming and labor-intensive. To address these challenges, we propose MobilePosenet, a lightweight neural network architecture that leverages depthwise separable convolutions to minimize computational cost and a channel attention mechanism to enhance localization accuracy. Besides, the inputs to the network integrate the sensors' coordinate information and random noise, compensating for the discrepancies between the theoretical model and the actual magnetic fields and thus allowing MobilePosenet to be trained entirely on theoretical data. Experimental evaluations conducted in a \(90 \times 90 \times 80\) mm workspace demonstrate that MobilePosenet exhibits excellent 5-DOF localization accuracy ($1.54 \pm 1.03$ mm and $2.24 \pm 1.84^{\circ}$) and inference speed (0.9 ms) against state-of-the-art methods trained on real-world data. Since network training relies solely on theoretical data, MobilePosenet can eliminate the hardware calibration and real-world data collection process, improving the generalizability of this permanent magnet localization method and the potential for rapid adoption in different clinical settings.