Abstract:Spatial visual perception is a fundamental requirement in physical-world applications like autonomous driving and robotic manipulation, driven by the need to interact with 3D environments. Capturing pixel-aligned metric depth using RGB-D cameras would be the most viable way, yet it usually faces obstacles posed by hardware limitations and challenging imaging conditions, especially in the presence of specular or texture-less surfaces. In this work, we argue that the inaccuracies from depth sensors can be viewed as "masked" signals that inherently reflect underlying geometric ambiguities. Building on this motivation, we present LingBot-Depth, a depth completion model which leverages visual context to refine depth maps through masked depth modeling and incorporates an automated data curation pipeline for scalable training. It is encouraging to see that our model outperforms top-tier RGB-D cameras in terms of both depth precision and pixel coverage. Experimental results on a range of downstream tasks further suggest that LingBot-Depth offers an aligned latent representation across RGB and depth modalities. We release the code, checkpoint, and 3M RGB-depth pairs (including 2M real data and 1M simulated data) to the community of spatial perception.




Abstract:Graph neural networks (GNNs) have achieved state-of-the-art performance in many graph-related tasks, e.g., node classification. However, recent works show that GNNs are vulnerable to evasion attacks, i.e., an attacker can slightly perturb the graph structure to fool GNN models. Existing evasion attacks to GNNs have several key drawbacks: 1) they are limited to attack two-layer GNNs; 2) they are not efficient; or/and 3) they need to know GNN model parameters. We address the above drawbacks in this paper and propose an influence-based evasion attack against GNNs. Specifically, we first introduce two influence functions, i.e., feature-label influence and label influence, that are defined on GNNs and label propagation (LP), respectively. Then, we build a strong connection between GNNs and LP in terms of influence. Next, we reformulate the evasion attack against GNNs to be related to calculating label influence on LP, which is applicable to multi-layer GNNs and does not need to know the GNN model. We also propose an efficient algorithm to calculate label influence. Finally, we evaluate our influence-based attack on three benchmark graph datasets. Our experimental results show that, compared to state-of-the-art attack, our attack can achieve comparable attack performance, but has a 5-50x speedup when attacking two-layer GNNs. Moreover, our attack is effective to attack multi-layer GNNs.