Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ruihan Xu

NN-Former: Rethinking Graph Structure in Neural Architecture Representation

Jul 01, 2025

Ruihan Xu, Haokui Zhang, Yaowei Wang, Wei Zeng, Shiliang Zhang

Abstract:The growing use of deep learning necessitates efficient network design and deployment, making neural predictors vital for estimating attributes such as accuracy and latency. Recently, Graph Neural Networks (GNNs) and transformers have shown promising performance in representing neural architectures. However, each of both methods has its disadvantages. GNNs lack the capabilities to represent complicated features, while transformers face poor generalization when the depth of architecture grows. To mitigate the above issues, we rethink neural architecture topology and show that sibling nodes are pivotal while overlooked in previous research. We thus propose a novel predictor leveraging the strengths of GNNs and transformers to learn the enhanced topology. We introduce a novel token mixer that considers siblings, and a new channel mixer named bidirectional graph isomorphism feed-forward network. Our approach consistently achieves promising performance in both accuracy and latency prediction, providing valuable insights for learning Directed Acyclic Graph (DAG) topology. The code is available at https://github.com/XuRuihan/NNFormer.

* Accepted to CVPR 2025. Code is avaiable at https://github.com/XuRuihan/NNFormer

Via

Access Paper or Ask Questions

What is a Sketch-and-Precondition Derivation for Low-Rank Approximation? Inverse Power Error or Inverse Power Estimation?

Feb 11, 2025

Ruihan Xu, Yiping Lu

Abstract:Randomized sketching accelerates large-scale numerical linear algebra by reducing computa- tional complexity. While the traditional sketch-and-solve approach reduces the problem size di- rectly through sketching, the sketch-and-precondition method leverages sketching to construct a computational friendly preconditioner. This preconditioner improves the convergence speed of iterative solvers applied to the original problem, maintaining accuracy in the full space. Further- more, the convergence rate of the solver improves at least linearly with the sketch size. Despite its potential, developing a sketch-and-precondition framework for randomized algorithms in low- rank matrix approximation remains an open challenge. We introduce the Error-Powered Sketched Inverse Iteration (EPSI) Method via run sketched Newton iteration for the Lagrange form as a sketch-and-precondition variant for randomized low-rank approximation. Our method achieves theoretical guarantees, including a convergence rate that improves at least linearly with the sketch size.

Via

Access Paper or Ask Questions

Latent BKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty

Oct 15, 2024

Joey Wilson, Ruihan Xu, Yile Sun, Parker Ewen, Minghan Zhu, Kira Barton, Maani Ghaffari

Figure 1 for Latent BKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty

Figure 2 for Latent BKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty

Figure 3 for Latent BKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty

Figure 4 for Latent BKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty

Abstract:This paper introduces a novel probabilistic mapping algorithm, Latent BKI, which enables open-vocabulary mapping with quantifiable uncertainty. Traditionally, semantic mapping algorithms focus on a fixed set of semantic categories which limits their applicability for complex robotic tasks. Vision-Language (VL) models have recently emerged as a technique to jointly model language and visual features in a latent space, enabling semantic recognition beyond a predefined, fixed set of semantic classes. Latent BKI recurrently incorporates neural embeddings from VL models into a voxel map with quantifiable uncertainty, leveraging the spatial correlations of nearby observations through Bayesian Kernel Inference (BKI). Latent BKI is evaluated against similar explicit semantic mapping and VL mapping frameworks on the popular MatterPort-3D and Semantic KITTI data sets, demonstrating that Latent BKI maintains the probabilistic benefits of continuous mapping with the additional benefit of open-dictionary queries. Real-world experiments demonstrate applicability to challenging indoor environments.

Via

Access Paper or Ask Questions

Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks

Sep 11, 2024

Ruihan Xu, Anthony Opipari, Joshua Mah, Stanley Lewis, Haoran Zhang, Hanzhe Guo, Odest Chadwicke Jenkins

Figure 1 for Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks

Figure 2 for Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks

Figure 3 for Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks

Figure 4 for Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks

Abstract:This paper introduces SO(2)-Equivariant Gaussian Sculpting Networks (GSNs) as an approach for SO(2)-Equivariant 3D object reconstruction from single-view image observations. GSNs take a single observation as input to generate a Gaussian splat representation describing the observed object's geometry and texture. By using a shared feature extractor before decoding Gaussian colors, covariances, positions, and opacities, GSNs achieve extremely high throughput (>150FPS). Experiments demonstrate that GSNs can be trained efficiently using a multi-view rendering loss and are competitive, in quality, with expensive diffusion-based reconstruction algorithms. The GSN model is validated on multiple benchmark experiments. Moreover, we demonstrate the potential for GSNs to be used within a robotic manipulation pipeline for object-centric grasping.

* Accepted to RSS 2024 Workshop on Geometric and Algebraic Structure in Robot Learning

Via

Access Paper or Ask Questions

Stein Variational Belief Propagation for Multi-Robot Coordination

Nov 28, 2023

Jana Pavlasek, Joshua Jing Zhi Mah, Ruihan Xu, Odest Chadwicke Jenkins, Fabio Ramos

Abstract:Decentralized coordination for multi-robot systems involves planning in challenging, high-dimensional spaces. The planning problem is particularly challenging in the presence of obstacles and different sources of uncertainty such as inaccurate dynamic models and sensor noise. In this paper, we introduce Stein Variational Belief Propagation (SVBP), a novel algorithm for performing inference over nonparametric marginal distributions of nodes in a graph. We apply SVBP to multi-robot coordination by modelling a robot swarm as a graphical model and performing inference for each robot. We demonstrate our algorithm on a simulated multi-robot perception task, and on a multi-robot planning task within a Model-Predictive Control (MPC) framework, on both simulated and real-world mobile robots. Our experiments show that SVBP represents multi-modal distributions better than sampling-based or Gaussian baselines, resulting in improved performance on perception and planning tasks. Furthermore, we show that SVBP's ability to represent diverse trajectories for decentralized multi-robot planning makes it less prone to deadlock scenarios than leading baselines.

Via

Access Paper or Ask Questions

Boosted ab initio Cryo-EM 3D Reconstruction with ACE-EM

Feb 14, 2023

Lin Yao, Ruihan Xu, Zhifeng Gao, Guolin Ke, Yuhang Wang

Abstract:The central problem in cryo-electron microscopy (cryo-EM) is to recover the 3D structure from noisy 2D projection images which requires estimating the missing projection angles (poses). Recent methods attempted to solve the 3D reconstruction problem with the autoencoder architecture, which suffers from the latent vector space sampling problem and frequently produces suboptimal pose inferences and inferior 3D reconstructions. Here we present an improved autoencoder architecture called ACE (Asymmetric Complementary autoEncoder), based on which we designed the ACE-EM method for cryo-EM 3D reconstructions. Compared to previous methods, ACE-EM reached higher pose space coverage within the same training time and boosted the reconstruction performance regardless of the choice of decoders. With this method, the Nyquist resolution (highest possible resolution) was reached for 3D reconstructions of both simulated and experimental cryo-EM datasets. Furthermore, ACE-EM is the only amortized inference method that reached the Nyquist resolution.

Via

Access Paper or Ask Questions

ParCNetV2: Oversized Kernel with Enhanced Attention

Nov 14, 2022

Ruihan Xu, Haokui Zhang, Wenze Hu, Shiliang Zhang, Xiaoyu Wang

Figure 1 for ParCNetV2: Oversized Kernel with Enhanced Attention

Figure 2 for ParCNetV2: Oversized Kernel with Enhanced Attention

Figure 3 for ParCNetV2: Oversized Kernel with Enhanced Attention

Figure 4 for ParCNetV2: Oversized Kernel with Enhanced Attention

Abstract:Transformers have achieved tremendous success in various computer vision tasks. By borrowing design concepts from transformers, many studies revolutionized CNNs and showed remarkable results. This paper falls in this line of studies. More specifically, we introduce a convolutional neural network architecture named ParCNetV2, which extends position-aware circular convolution (ParCNet) with oversized convolutions and strengthens attention through bifurcate gate units. The oversized convolution utilizes a kernel with $2\times$ the input size to model long-range dependencies through a global receptive field. Simultaneously, it achieves implicit positional encoding by removing the shift-invariant property from convolutional kernels, i.e., the effective kernels at different spatial locations are different when the kernel size is twice as large as the input size. The bifurcate gate unit implements an attention mechanism similar to self-attention in transformers. It splits the input into two branches, one serves as feature transformation while the other serves as attention weights. The attention is applied through element-wise multiplication of the two branches. Besides, we introduce a unified local-global convolution block to unify the design of the early and late stage convolutional blocks. Extensive experiments demonstrate that our method outperforms other pure convolutional neural networks as well as neural networks hybridizing CNNs and transformers.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions