Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Na Lei

Robotic Manipulation via Imitation Learning: Taxonomy, Evolution, Benchmark, and Challenges

Aug 24, 2025

Zezeng Li, Alexandre Chapin, Enda Xiang, Rui Yang, Bruno Machado, Na Lei, Emmanuel Dellandrea, Di Huang, Liming Chen

Abstract:Robotic Manipulation (RM) is central to the advancement of autonomous robots, enabling them to interact with and manipulate objects in real-world environments. This survey focuses on RM methodologies that leverage imitation learning, a powerful technique that allows robots to learn complex manipulation skills by mimicking human demonstrations. We identify and analyze the most influential studies in this domain, selected based on community impact and intrinsic quality. For each paper, we provide a structured summary, covering the research purpose, technical implementation, hierarchical classification, input formats, key priors, strengths and limitations, and citation metrics. Additionally, we trace the chronological development of imitation learning techniques within RM policy (RMP), offering a timeline of key technological advancements. Where available, we report benchmark results and perform quantitative evaluations to compare existing methods. By synthesizing these insights, this review provides a comprehensive resource for researchers and practitioners, highlighting both the state of the art and the challenges that lie ahead in the field of robotic manipulation through imitation learning.

Via

Access Paper or Ask Questions

Point2Quad: Generating Quad Meshes from Point Clouds via Face Prediction

Apr 28, 2025

Zezeng Li, Zhihui Qi, Weimin Wang, Ziliang Wang, Junyi Duan, Na Lei

Abstract:Quad meshes are essential in geometric modeling and computational mechanics. Although learning-based methods for triangle mesh demonstrate considerable advancements, quad mesh generation remains less explored due to the challenge of ensuring coplanarity, convexity, and quad-only meshes. In this paper, we present Point2Quad, the first learning-based method for quad-only mesh generation from point clouds. The key idea is learning to identify quad mesh with fused pointwise and facewise features. Specifically, Point2Quad begins with a k-NN-based candidate generation considering the coplanarity and squareness. Then, two encoders are followed to extract geometric and topological features that address the challenge of quad-related constraints, especially by combining in-depth quadrilaterals-specific characteristics. Subsequently, the extracted features are fused to train the classifier with a designed compound loss. The final results are derived after the refinement by a quad-specific post-processing. Extensive experiments on both clear and noise data demonstrate the effectiveness and superiority of Point2Quad, compared to baseline methods under comprehensive metrics.

Via

Access Paper or Ask Questions

Diff-CL: A Novel Cross Pseudo-Supervision Method for Semi-supervised Medical Image Segmentation

Mar 12, 2025

Xiuzhen Guo, Lianyuan Yu, Ji Shi, Na Lei, Hongxiao Wang

Figure 1 for Diff-CL: A Novel Cross Pseudo-Supervision Method for Semi-supervised Medical Image Segmentation

Figure 2 for Diff-CL: A Novel Cross Pseudo-Supervision Method for Semi-supervised Medical Image Segmentation

Figure 3 for Diff-CL: A Novel Cross Pseudo-Supervision Method for Semi-supervised Medical Image Segmentation

Figure 4 for Diff-CL: A Novel Cross Pseudo-Supervision Method for Semi-supervised Medical Image Segmentation

Abstract:Semi-supervised learning utilizes insights from unlabeled data to improve model generalization, thereby reducing reliance on large labeled datasets. Most existing studies focus on limited samples and fail to capture the overall data distribution. We contend that combining distributional information with detailed information is crucial for achieving more robust and accurate segmentation results. On the one hand, with its robust generative capabilities, diffusion models (DM) learn data distribution effectively. However, it struggles with fine detail capture, leading to generated images with misleading details. Combining DM with convolutional neural networks (CNNs) enables the former to learn data distribution while the latter corrects fine details. While capturing complete high-frequency details by CNNs requires substantial computational resources and is susceptible to local noise. On the other hand, given that both labeled and unlabeled data come from the same distribution, we believe that regions in unlabeled data similar to overall class semantics to labeled data are likely to belong to the same class, while regions with minimal similarity are less likely to. This work introduces a semi-supervised medical image segmentation framework from the distribution perspective (Diff-CL). Firstly, we propose a cross-pseudo-supervision learning mechanism between diffusion and convolution segmentation networks. Secondly, we design a high-frequency mamba module to capture boundary and detail information globally. Finally, we apply contrastive learning for label propagation from labeled to unlabeled data. Our method achieves state-of-the-art (SOTA) performance across three datasets, including left atrium, brain tumor, and NIH pancreas datasets.

Via

Access Paper or Ask Questions

Solving Prior Distribution Mismatch in Diffusion Models via Optimal Transport

Oct 17, 2024

Zhanpeng Wang, Shenghao Li, Chen Wang, Shuting Cao, Na Lei, Zhongxuan Luo

Figure 1 for Solving Prior Distribution Mismatch in Diffusion Models via Optimal Transport

Figure 2 for Solving Prior Distribution Mismatch in Diffusion Models via Optimal Transport

Figure 3 for Solving Prior Distribution Mismatch in Diffusion Models via Optimal Transport

Abstract:In recent years, the knowledge surrounding diffusion models(DMs) has grown significantly, though several theoretical gaps remain. Particularly noteworthy is prior error, defined as the discrepancy between the termination distribution of the forward process and the initial distribution of the reverse process. To address these deficiencies, this paper explores the deeper relationship between optimal transport(OT) theory and DMs with discrete initial distribution. Specifically, we demonstrate that the two stages of DMs fundamentally involve computing time-dependent OT. However, unavoidable prior error result in deviation during the reverse process under quadratic transport cost. By proving that as the diffusion termination time increases, the probability flow exponentially converges to the gradient of the solution to the classical Monge-Amp\`ere equation, we establish a vital link between these fields. Therefore, static OT emerges as the most intrinsic single-step method for bridging this theoretical potential gap. Additionally, we apply these insights to accelerate sampling in both unconditional and conditional generation scenarios. Experimental results across multiple image datasets validate the effectiveness of our approach.

Via

Access Paper or Ask Questions

HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation

Jul 19, 2024

Zezeng Li, Weimin Wang, WenHai Li, Na Lei, Xianfeng Gu

Figure 1 for HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation

Figure 2 for HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation

Figure 3 for HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation

Figure 4 for HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation

Abstract:Recent CLIP-guided 3D generation methods have achieved promising results but struggle with generating faithful 3D shapes that conform with input text due to the gap between text and image embeddings. To this end, this paper proposes HOTS3D which makes the first attempt to effectively bridge this gap by aligning text features to the image features with spherical optimal transport (SOT). However, in high-dimensional situations, solving the SOT remains a challenge. To obtain the SOT map for high-dimensional features obtained from CLIP encoding of two modalities, we mathematically formulate and derive the solution based on Villani's theorem, which can directly align two hyper-sphere distributions without manifold exponential maps. Furthermore, we implement it by leveraging input convex neural networks (ICNNs) for the optimal Kantorovich potential. With the optimally mapped features, a diffusion-based generator and a Nerf-based decoder are subsequently utilized to transform them into 3D shapes. Extensive qualitative and qualitative comparisons with state-of-the-arts demonstrate the superiority of the proposed HOTS3D for 3D shape generation, especially on the consistency with text semantics.

Via

Access Paper or Ask Questions

MergeNet: Explicit Mesh Reconstruction from Sparse Point Clouds via Edge Prediction

Jul 16, 2024

Weimin Wang, Yingxu Deng, Zezeng Li, Yu Liu, Na Lei

Abstract:This paper introduces a novel method for reconstructing meshes from sparse point clouds by predicting edge connection. Existing implicit methods usually produce superior smooth and watertight meshes due to the isosurface extraction algorithms~(e.g., Marching Cubes). However, these methods become memory and computationally intensive with increasing resolution. Explicit methods are more efficient by directly forming the face from points. Nevertheless, the challenge of selecting appropriate faces from enormous candidates often leads to undesirable faces and holes. Moreover, the reconstruction performance of both approaches tends to degrade when the point cloud gets sparse. To this end, we propose MEsh Reconstruction via edGE~(MergeNet), which converts mesh reconstruction into local connectivity prediction problems. Specifically, MergeNet learns to extract the features of candidate edges and regress their distances to the underlying surface. Consequently, the predicted distance is utilized to filter out edges that lay on surfaces. Finally, the meshes are reconstructed by refining the triangulations formed by these edges. Extensive experiments on synthetic and real-scanned datasets demonstrate the superiority of MergeNet to SoTA explicit methods.

Via

Access Paper or Ask Questions

Learning Unsigned Distance Fields from Local Shape Functions for 3D Surface Reconstruction

Jul 01, 2024

Jiangbei Hu, Yanggeng Li, Fei Hou, Junhui Hou, Zhebin Zhang, Shengfa Wang, Na Lei, Ying He

Figure 1 for Learning Unsigned Distance Fields from Local Shape Functions for 3D Surface Reconstruction

Figure 2 for Learning Unsigned Distance Fields from Local Shape Functions for 3D Surface Reconstruction

Figure 3 for Learning Unsigned Distance Fields from Local Shape Functions for 3D Surface Reconstruction

Figure 4 for Learning Unsigned Distance Fields from Local Shape Functions for 3D Surface Reconstruction

Abstract:Unsigned distance fields (UDFs) provide a versatile framework for representing a diverse array of 3D shapes, encompassing both watertight and non-watertight geometries. Traditional UDF learning methods typically require extensive training on large datasets of 3D shapes, which is costly and often necessitates hyperparameter adjustments for new datasets. This paper presents a novel neural framework, LoSF-UDF, for reconstructing surfaces from 3D point clouds by leveraging local shape functions to learn UDFs. We observe that 3D shapes manifest simple patterns within localized areas, prompting us to create a training dataset of point cloud patches characterized by mathematical functions that represent a continuum from smooth surfaces to sharp edges and corners. Our approach learns features within a specific radius around each query point and utilizes an attention mechanism to focus on the crucial features for UDF estimation. This method enables efficient and robust surface reconstruction from point clouds without the need for shape-specific training. Additionally, our method exhibits enhanced resilience to noise and outliers in point clouds compared to existing methods. We present comprehensive experiments and comparisons across various datasets, including synthetic and real-scanned point clouds, to validate our method's efficacy.

* 14 pages, 11 figures

Via

Access Paper or Ask Questions

Global Parameterization-based Texture Space Optimization

Jun 06, 2024

Wei Chen, Yuxue Ren, Na Lei, Zhongxuan Luo, Xianfeng Gu

Figure 1 for Global Parameterization-based Texture Space Optimization

Figure 2 for Global Parameterization-based Texture Space Optimization

Figure 3 for Global Parameterization-based Texture Space Optimization

Figure 4 for Global Parameterization-based Texture Space Optimization

Abstract:Texture mapping is a common technology in the area of computer graphics, it maps the 3D surface space onto the 2D texture space. However, the loose texture space will reduce the efficiency of data storage and GPU memory addressing in the rendering process. Many of the existing methods focus on repacking given textures, but they still suffer from high computational cost and hardly produce a wholly tight texture space. In this paper, we propose a method to optimize the texture space and produce a new texture mapping which is compact based on global parameterization. The proposed method is computationally robust and efficient. Experiments show the effectiveness of the proposed method and the potency in improving the storage and rendering efficiency.

* Preprint submitted to Comput. Math. Math. Phys

Via

Access Paper or Ask Questions

Point Cloud Compression via Constrained Optimal Transport

Mar 13, 2024

Zezeng Li, Weimin Wang, Ziliang Wang, Na Lei

Abstract:This paper presents a novel point cloud compression method COT-PCC by formulating the task as a constrained optimal transport (COT) problem. COT-PCC takes the bitrate of compressed features as an extra constraint of optimal transport (OT) which learns the distribution transformation between original and reconstructed points. Specifically, the formulated COT is implemented with a generative adversarial network (GAN) and a bitrate loss for training. The discriminator measures the Wasserstein distance between input and reconstructed points, and a generator calculates the optimal mapping between distributions of input and reconstructed point cloud. Moreover, we introduce a learnable sampling module for downsampling in the compression procedure. Extensive results on both sparse and dense point cloud datasets demonstrate that COT-PCC outperforms state-of-the-art methods in terms of both CD and PSNR metrics. Source codes are available at \url{https://github.com/cognaclee/PCC-COT}.

Via

Access Paper or Ask Questions

Topology-Aware Latent Diffusion for 3D Shape Generation

Jan 31, 2024

Jiangbei Hu, Ben Fei, Baixin Xu, Fei Hou, Weidong Yang, Shengfa Wang, Na Lei, Chen Qian, Ying He

Abstract:We introduce a new generative model that combines latent diffusion with persistent homology to create 3D shapes with high diversity, with a special emphasis on their topological characteristics. Our method involves representing 3D shapes as implicit fields, then employing persistent homology to extract topological features, including Betti numbers and persistence diagrams. The shape generation process consists of two steps. Initially, we employ a transformer-based autoencoding module to embed the implicit representation of each 3D shape into a set of latent vectors. Subsequently, we navigate through the learned latent space via a diffusion model. By strategically incorporating topological features into the diffusion process, our generative module is able to produce a richer variety of 3D shapes with different topological structures. Furthermore, our framework is flexible, supporting generation tasks constrained by a variety of inputs, including sparse and partial point clouds, as well as sketches. By modifying the persistence diagrams, we can alter the topology of the shapes generated from these input modalities.

* 16 pages, 9 figures

Via

Access Paper or Ask Questions