Abstract:Vision-Language Models (VLMs) face a critical bottleneck in achieving precise numerical prediction for 3D scene understanding. Traditional reinforcement learning (RL) approaches, primarily based on relative ranking, often suffer from severe reward sparsity and gradient instability, failing to effectively exploit the verifiable signals provided by 3D physical constraints. Notably, in standard GRPO frameworks, relative normalization causes "near-miss" samples (characterized by small but non-zero errors) to suffer from advantage collapse. This leads to a severe data utilization bottleneck where valuable boundary samples are discarded during optimization. To address this, we introduce the Smooth Numerical Reward Activation (SNRA) operator and the Absolute-Preserving GRPO (AP-GRPO) framework. SNRA employs a dynamically parameterized Sigmoid function to transform raw feedback into a dense, continuous reward continuum. Concurrently, AP-GRPO integrates absolute scalar gradients to mitigate the numerical information loss inherent in conventional relative-ranking mechanisms. By leveraging this approach, we constructed Numerical3D-50k, a dataset comprising 50,000 verifiable 3D subtasks. Empirical results indicate that AP-GRPO achieves performance parity with large-scale supervised methods while maintaining higher data efficiency, effectively activating latent 3D reasoning in VLMs without requiring architectural modifications.




Abstract:This paper presents a decentralized Gaussian Process (GP) learning, fusion, and planning (RESIN) formalism for mobile sensor networks to actively learn target motion models. RESIN is characterized by both computational and communication efficiency, and the robustness to rumor propagation in sensor networks. By using the weighted exponential product rule and the Chernoff information, a rumor-robust decentralized GP fusion approach is developed to generate a globally consistent target trajectory prediction from local GP models. A decentralized information-driven path planning approach is then proposed for mobile sensors to generate informative sensing paths. A novel, constant-sized information sharing strategy is developed for path coordination between sensors, and an analytical objective function is derived that significantly reduces the computational complexity of the path planning. The effectiveness of RESIN is demonstrated in various numerical simulations.