Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Minghan Zhu

Vysics: Object Reconstruction Under Occlusion by Fusing Vision and Contact-Rich Physics

Apr 25, 2025

Bibit Bianchini, Minghan Zhu, Mengti Sun, Bowen Jiang, Camillo J. Taylor, Michael Posa

Abstract:We introduce Vysics, a vision-and-physics framework for a robot to build an expressive geometry and dynamics model of a single rigid body, using a seconds-long RGBD video and the robot's proprioception. While the computer vision community has built powerful visual 3D perception algorithms, cluttered environments with heavy occlusions can limit the visibility of objects of interest. However, observed motion of partially occluded objects can imply physical interactions took place, such as contact with a robot or the environment. These inferred contacts can supplement the visible geometry with "physible geometry," which best explains the observed object motion through physics. Vysics uses a vision-based tracking and reconstruction method, BundleSDF, to estimate the trajectory and the visible geometry from an RGBD video, and an odometry-based model learning method, Physics Learning Library (PLL), to infer the "physible" geometry from the trajectory through implicit contact dynamics optimization. The visible and "physible" geometries jointly factor into optimizing a signed distance function (SDF) to represent the object shape. Vysics does not require pretraining, nor tactile or force sensors. Compared with vision-only methods, Vysics yields object models with higher geometric accuracy and better dynamics prediction in experiments where the object interacts with the robot and the environment under heavy occlusion. Project page: https://vysics-vision-and-physics.github.io/

* B.B. and M.Z. contributed equally to this work. Accepted to RSS 2025. Project page: https://vysics-vision-and-physics.github.io/

Via

Access Paper or Ask Questions

Generative LiDAR Editing with Controllable Novel Object Layouts

Nov 30, 2024

Shing-Hei Ho, Bao Thach, Minghan Zhu

Figure 1 for Generative LiDAR Editing with Controllable Novel Object Layouts

Figure 2 for Generative LiDAR Editing with Controllable Novel Object Layouts

Figure 3 for Generative LiDAR Editing with Controllable Novel Object Layouts

Figure 4 for Generative LiDAR Editing with Controllable Novel Object Layouts

Abstract:We propose a framework to edit real-world Lidar scans with novel object layouts while preserving a realistic background environment. Compared to the synthetic data generation frameworks where Lidar point clouds are generated from scratch, our framework focuses on new scenario generation in a given background environment, and our method also provides labels for the generated data. This approach ensures the generated data remains relevant to the specific environment, aiding both the development and the evaluation of algorithms in real-world scenarios. Compared with novel view synthesis, our framework allows the creation of counterfactual scenarios with significant changes in the object layout and does not rely on multi-frame optimization. In our framework, the object removal and insertion are supported by generative background inpainting and object point cloud completion, and the entire pipeline is built upon spherical voxelization, which realizes the correct Lidar projective geometry by construction. Experiments show that our framework generates realistic Lidar scans with object layout changes and benefits the development of Lidar-based self-driving systems.

* Submitted to IEEE International Conference on Robotics and Automation (ICRA). 6 pages, 7 figures

Via

Access Paper or Ask Questions

Latent BKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty

Oct 15, 2024

Joey Wilson, Ruihan Xu, Yile Sun, Parker Ewen, Minghan Zhu, Kira Barton, Maani Ghaffari

Figure 1 for Latent BKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty

Figure 2 for Latent BKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty

Figure 3 for Latent BKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty

Figure 4 for Latent BKI: Open-Dictionary Continuous Mapping in Visual-Language Latent Spaces with Quantifiable Uncertainty

Abstract:This paper introduces a novel probabilistic mapping algorithm, Latent BKI, which enables open-vocabulary mapping with quantifiable uncertainty. Traditionally, semantic mapping algorithms focus on a fixed set of semantic categories which limits their applicability for complex robotic tasks. Vision-Language (VL) models have recently emerged as a technique to jointly model language and visual features in a latent space, enabling semantic recognition beyond a predefined, fixed set of semantic classes. Latent BKI recurrently incorporates neural embeddings from VL models into a voxel map with quantifiable uncertainty, leveraging the spatial correlations of nearby observations through Bayesian Kernel Inference (BKI). Latent BKI is evaluated against similar explicit semantic mapping and VL mapping frameworks on the popular MatterPort-3D and Semantic KITTI data sets, demonstrating that Latent BKI maintains the probabilistic benefits of continuous mapping with the additional benefit of open-dictionary queries. Real-world experiments demonstrate applicability to challenging indoor environments.

Via

Access Paper or Ask Questions

SE3ET: SE(3)-Equivariant Transformer for Low-Overlap Point Cloud Registration

Jul 23, 2024

Chien Erh Lin, Minghan Zhu, Maani Ghaffari

Figure 1 for SE3ET: SE(3)-Equivariant Transformer for Low-Overlap Point Cloud Registration

Figure 2 for SE3ET: SE(3)-Equivariant Transformer for Low-Overlap Point Cloud Registration

Figure 3 for SE3ET: SE(3)-Equivariant Transformer for Low-Overlap Point Cloud Registration

Figure 4 for SE3ET: SE(3)-Equivariant Transformer for Low-Overlap Point Cloud Registration

Abstract:Partial point cloud registration is a challenging problem in robotics, especially when the robot undergoes a large transformation, causing a significant initial pose error and a low overlap between measurements. This work proposes exploiting equivariant learning from 3D point clouds to improve registration robustness. We propose SE3ET, an SE(3)-equivariant registration framework that employs equivariant point convolution and equivariant transformer designs to learn expressive and robust geometric features. We tested the proposed registration method on indoor and outdoor benchmarks where the point clouds are under arbitrary transformations and low overlapping ratios. We also provide generalization tests and run-time performance.

Via

Access Paper or Ask Questions

Lie Neurons: Adjoint-Equivariant Neural Networks for Semisimple Lie Algebras

Oct 06, 2023

Tzu-Yuan Lin, Minghan Zhu, Maani Ghaffari

Figure 1 for Lie Neurons: Adjoint-Equivariant Neural Networks for Semisimple Lie Algebras

Figure 2 for Lie Neurons: Adjoint-Equivariant Neural Networks for Semisimple Lie Algebras

Figure 3 for Lie Neurons: Adjoint-Equivariant Neural Networks for Semisimple Lie Algebras

Figure 4 for Lie Neurons: Adjoint-Equivariant Neural Networks for Semisimple Lie Algebras

Abstract:This paper proposes an adjoint-equivariant neural network that takes Lie algebra data as input. Various types of equivariant neural networks have been proposed in the literature, which treat the input data as elements in a vector space carrying certain types of transformations. In comparison, we aim to process inputs that are transformations between vector spaces. The change of basis on transformation is described by conjugations, inducing the adjoint-equivariance relationship that our model is designed to capture. Leveraging the invariance property of the Killing form, the proposed network is a general framework that works for arbitrary semisimple Lie algebras. Our network possesses a simple structure that can be viewed as a Lie algebraic generalization of a multi-layer perceptron (MLP). This work extends the application of equivariant feature learning. As an example, we showcase its value in homography modeling using sl(3) Lie algebra.

Via

Access Paper or Ask Questions

4D Panoptic Segmentation as Invariant and Equivariant Field Prediction

Mar 28, 2023

Minghan Zhu, Shizong Han, Hong Cai, Shubhankar Borse, Maani Ghaffari Jadidi, Fatih Porikli

Abstract:In this paper, we develop rotation-equivariant neural networks for 4D panoptic segmentation. 4D panoptic segmentation is a recently established benchmark task for autonomous driving, which requires recognizing semantic classes and object instances on the road based on LiDAR scans, as well as assigning temporally consistent IDs to instances across time. We observe that the driving scenario is symmetric to rotations on the ground plane. Therefore, rotation-equivariance could provide better generalization and more robust feature learning. Specifically, we review the object instance clustering strategies, and restate the centerness-based approach and the offset-based approach as the prediction of invariant scalar fields and equivariant vector fields. Other sub-tasks are also unified from this perspective, and different invariant and equivariant layers are designed to facilitate their predictions. Through evaluation on the standard 4D panoptic segmentation benchmark of SemanticKITTI, we show that our equivariant models achieve higher accuracy with lower computational costs compared to their non-equivariant counterparts. Moreover, our method sets the new state-of-the-art performance and achieves 1st place on the SemanticKITTI 4D Panoptic Segmentation leaderboard.

Via

Access Paper or Ask Questions

MonoEdge: Monocular 3D Object Detection Using Local Perspectives

Jan 04, 2023

Minghan Zhu, Lingting Ge, Panqu Wang, Huei Peng

Abstract:We propose a novel approach for monocular 3D object detection by leveraging local perspective effects of each object. While the global perspective effect shown as size and position variations has been exploited for monocular 3D detection extensively, the local perspectives has long been overlooked. We design a local perspective module to regress a newly defined variable named keyedge-ratios as the parameterization of the local shape distortion to account for the local perspective, and derive the object depth and yaw angle from it. Theoretically, this module does not rely on the pixel-wise size or position in the image of the objects, therefore independent of the camera intrinsic parameters. By plugging this module in existing monocular 3D object detection frameworks, we incorporate the local perspective distortion with global perspective effect for monocular 3D reasoning, and we demonstrate the effectiveness and superior performance over strong baseline methods in multiple datasets.

* WACV 2023

Via

Access Paper or Ask Questions

Mix-Teaching: A Simple, Unified and Effective Semi-Supervised Learning Framework for Monocular 3D Object Detection

Jul 10, 2022

Lei Yang, Xinyu Zhang, Li Wang, Minghan Zhu, Chuang Zhang, Jun Li

Figure 1 for Mix-Teaching: A Simple, Unified and Effective Semi-Supervised Learning Framework for Monocular 3D Object Detection

Figure 2 for Mix-Teaching: A Simple, Unified and Effective Semi-Supervised Learning Framework for Monocular 3D Object Detection

Figure 3 for Mix-Teaching: A Simple, Unified and Effective Semi-Supervised Learning Framework for Monocular 3D Object Detection

Figure 4 for Mix-Teaching: A Simple, Unified and Effective Semi-Supervised Learning Framework for Monocular 3D Object Detection

Abstract:Monocular 3D object detection is an essential perception task for autonomous driving. However, the high reliance on large-scale labeled data make it costly and time-consuming during model optimization. To reduce such over-reliance on human annotations, we propose Mix-Teaching, an effective semi-supervised learning framework applicable to employ both labeled and unlabeled images in training stage. Mix-Teaching first generates pseudo-labels for unlabeled images by self-training. The student model is then trained on the mixed images possessing much more intensive and precise labeling by merging instance-level image patches into empty backgrounds or labeled images. This is the first to break the image-level limitation and put high-quality pseudo labels from multi frames into one image for semi-supervised training. Besides, as a result of the misalignment between confidence score and localization quality, it's hard to discriminate high-quality pseudo-labels from noisy predictions using only confidence-based criterion. To that end, we further introduce an uncertainty-based filter to help select reliable pseudo boxes for the above mixing operation. To the best of our knowledge, this is the first unified SSL framework for monocular 3D object detection. Mix-Teaching consistently improves MonoFlex and GUPNet by significant margins under various labeling ratios on KITTI dataset. For example, our method achieves around +6.34% AP@0.7 improvement against the GUPNet baseline on validation set when using only 10% labeled data. Besides, by leveraging full training set and the additional 48K raw images of KITTI, it can further improve the MonoFlex by +4.65% improvement on AP@0.7 for car detection, reaching 18.54% AP@0.7, which ranks the 1st place among all monocular based methods on KITTI test leaderboard. The code and pretrained models will be released at https://github.com/yanglei18/Mix-Teaching.

* 11 pages, 5 figures, 7 tables

Via

Access Paper or Ask Questions

E$^2$PN: Efficient SE-Equivariant Point Network

Jun 11, 2022

Minghan Zhu, Maani Ghaffari, William A. Clark, Huei Peng

Figure 1 for E$^2$PN: Efficient SE-Equivariant Point Network

Figure 2 for E$^2$PN: Efficient SE-Equivariant Point Network

Figure 3 for E$^2$PN: Efficient SE-Equivariant Point Network

Figure 4 for E$^2$PN: Efficient SE-Equivariant Point Network

Abstract:This paper proposes a new point-cloud convolution structure that learns SE(3)-equivariant features. Compared with existing SE(3)-equivariant networks, our design is lightweight, simple, and flexible to be incorporated into general point-cloud learning networks. We strike a balance between the complexity and capacity of our model by selecting an unconventional domain for the feature maps. We further reduce the computational load by properly discretizing $\mathbb{R}^3$ to fully leverage the rotational symmetry. Moreover, we employ a permutation layer to recover the full SE(3) group from its quotient space. Experiments show that our method achieves comparable or superior performance in various tasks while consuming much less memory and running faster than existing work. The proposed method can foster the adoption of equivariant feature learning in various practical applications based on point clouds and inspire future developments of equivariant feature learning for real-world applications.

* 9 pages

Via

Access Paper or Ask Questions

Correspondence-Free Point Cloud Registration with SO(3)-Equivariant Implicit Shape Representations

Jul 21, 2021

Minghan Zhu, Maani Ghaffari, Huei Peng

Figure 1 for Correspondence-Free Point Cloud Registration with SO(3)-Equivariant Implicit Shape Representations

Figure 2 for Correspondence-Free Point Cloud Registration with SO(3)-Equivariant Implicit Shape Representations

Figure 3 for Correspondence-Free Point Cloud Registration with SO(3)-Equivariant Implicit Shape Representations

Figure 4 for Correspondence-Free Point Cloud Registration with SO(3)-Equivariant Implicit Shape Representations

Abstract:This paper proposes a correspondence-free method for point cloud rotational registration. We learn an embedding for each point cloud in a feature space that preserves the SO(3)-equivariance property, enabled by recent developments in equivariant neural networks. The proposed shape registration method achieves three major advantages through combining equivariant feature learning with implicit shape models. First, the necessity of data association is removed because of the permutation-invariant property in network architectures similar to PointNet. Second, the registration in feature space can be solved in closed-form using Horn's method due to the SO(3)-equivariance property. Third, the registration is robust to noise in the point cloud because of implicit shape learning. The experimental results show superior performance compared with existing correspondence-free deep registration methods.

* 7 pages. 2 figures. Submitted to CoRL 2021

Via

Access Paper or Ask Questions