Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhiyong Su

Open-world Point Cloud Semantic Segmentation: A Human-in-the-loop Framework

Aug 07, 2025

Peng Zhang, Songru Yang, Jinsheng Sun, Weiqing Li, Zhiyong Su

Abstract:Open-world point cloud semantic segmentation (OW-Seg) aims to predict point labels of both base and novel classes in real-world scenarios. However, existing methods rely on resource-intensive offline incremental learning or densely annotated support data, limiting their practicality. To address these limitations, we propose HOW-Seg, the first human-in-the-loop framework for OW-Seg. Specifically, we construct class prototypes, the fundamental segmentation units, directly on the query data, avoiding the prototype bias caused by intra-class distribution shifts between the support and query data. By leveraging sparse human annotations as guidance, HOW-Seg enables prototype-based segmentation for both base and novel classes. Considering the lack of granularity of initial prototypes, we introduce a hierarchical prototype disambiguation mechanism to refine ambiguous prototypes, which correspond to annotations of different classes. To further enrich contextual awareness, we employ a dense conditional random field (CRF) upon the refined prototypes to optimize their label assignments. Through iterative human feedback, HOW-Seg dynamically improves its predictions, achieving high-quality segmentation for both base and novel classes. Experiments demonstrate that with sparse annotations (e.g., one-novel-class-one-click), HOW-Seg matches or surpasses the state-of-the-art generalized few-shot segmentation (GFS-Seg) method under the 5-shot setting. When using advanced backbones (e.g., Stratified Transformer) and denser annotations (e.g., 10 clicks per sub-scene), HOW-Seg achieves 85.27% mIoU on S3DIS and 66.37% mIoU on ScanNetv2, significantly outperforming alternatives.

* To be published in IEEE Transactions on Circuits and Systems for Video Technology

Via

Access Paper or Ask Questions

No-reference geometry quality assessment for colorless point clouds via list-wise rank learning

Feb 17, 2025

Zheng Li, Bingxu Xie, Chao Chu, Weiqing Li, Zhiyong Su

Abstract:Geometry quality assessment (GQA) of colorless point clouds is crucial for evaluating the performance of emerging point cloud-based solutions (e.g., watermarking, compression, and 3-Dimensional (3D) reconstruction). Unfortunately, existing objective GQA approaches are traditional full-reference metrics, whereas state-of-the-art learning-based point cloud quality assessment (PCQA) methods target both color and geometry distortions, neither of which are qualified for the no-reference GQA task. In addition, the lack of large-scale GQA datasets with subjective scores, which are always imprecise, biased, and inconsistent, also hinders the development of learning-based GQA metrics. Driven by these limitations, this paper proposes a no-reference geometry-only quality assessment approach based on list-wise rank learning, termed LRL-GQA, which comprises of a geometry quality assessment network (GQANet) and a list-wise rank learning network (LRLNet). The proposed LRL-GQA formulates the no-reference GQA as a list-wise rank problem, with the objective of directly optimizing the entire quality ordering. Specifically, a large dataset containing a variety of geometry-only distortions is constructed first, named LRL dataset, in which each sample is label-free but coupled with quality ranking information. Then, the GQANet is designed to capture intrinsic multi-scale patch-wise geometric features in order to predict a quality index for each point cloud. After that, the LRLNet leverages the LRL dataset and a likelihood loss to train the GQANet and ranks the input list of degraded point clouds according to their distortion levels. In addition, the pre-trained GQANet can be fine-tuned further to obtain absolute quality scores. Experimental results demonstrate the superior performance of the proposed no-reference LRL-GQA method compared with existing full-reference GQA metrics.

* Computers & Graphics, Volume 127, April 2025, 104176

Via

Access Paper or Ask Questions

The Worse The Better: Content-Aware Viewpoint Generation Network for Projection-related Point Cloud Quality Assessment

Feb 17, 2025

Zhiyong Su, Bingxu Xie, Zheng Li, Jincan Wu, Weiqing Li

Abstract:Through experimental studies, however, we observed the instability of final predicted quality scores, which change significantly over different viewpoint settings. Inspired by the "wooden barrel theory", given the default content-independent viewpoints of existing projection-related PCQA approaches, this paper presents a novel content-aware viewpoint generation network (CAVGN) to learn better viewpoints by taking the distribution of geometric and attribute features of degraded point clouds into consideration. Firstly, the proposed CAVGN extracts multi-scale geometric and texture features of the entire input point cloud, respectively. Then, for each default content-independent viewpoint, the extracted geometric and texture features are refined to focus on its corresponding visible part of the input point cloud. Finally, the refined geometric and texture features are concatenated to generate an optimized viewpoint. To train the proposed CAVGN, we present a self-supervised viewpoint ranking network (SSVRN) to select the viewpoint with the worst quality projected image to construct a default-optimized viewpoint dataset, which consists of thousands of paired default viewpoints and corresponding optimized viewpoints. Experimental results show that the projection-related PCQA methods can achieve higher performance using the viewpoints generated by the proposed CAVGN.

* To be published in IEEE Transactions on Circuits and Systems for Video Technology

Via

Access Paper or Ask Questions

Mesh deformation-based single-view 3D reconstruction of thin eyeglasses frames with differentiable rendering

Aug 10, 2024

Fan Zhang, Ziyue Ji, Weiguang Kang, Weiqing Li, Zhiyong Su

Abstract:With the support of Virtual Reality (VR) and Augmented Reality (AR) technologies, the 3D virtual eyeglasses try-on application is well on its way to becoming a new trending solution that offers a "try on" option to select the perfect pair of eyeglasses at the comfort of your own home. Reconstructing eyeglasses frames from a single image with traditional depth and image-based methods is extremely difficult due to their unique characteristics such as lack of sufficient texture features, thin elements, and severe self-occlusions. In this paper, we propose the first mesh deformation-based reconstruction framework for recovering high-precision 3D full-frame eyeglasses models from a single RGB image, leveraging prior and domain-specific knowledge. Specifically, based on the construction of a synthetic eyeglasses frame dataset, we first define a class-specific eyeglasses frame template with pre-defined keypoints. Then, given an input eyeglasses frame image with thin structure and few texture features, we design a keypoint detector and refiner to detect predefined keypoints in a coarse-to-fine manner to estimate the camera pose accurately. After that, using differentiable rendering, we propose a novel optimization approach for producing correct geometry by progressively performing free-form deformation (FFD) on the template mesh. We define a series of loss functions to enforce consistency between the rendered result and the corresponding RGB input, utilizing constraints from inherent structure, silhouettes, keypoints, per-pixel shading information, and so on. Experimental results on both the synthetic dataset and real images demonstrate the effectiveness of the proposed algorithm.

* Graphical Models, Volume 135, October 2024, 101225

Via

Access Paper or Ask Questions

Fine-grained Metrics for Point Cloud Semantic Segmentation

Jul 31, 2024

Zhuheng Lu, Ting Wu, Yuewei Dai, Weiqing Li, Zhiyong Su

Abstract:Two forms of imbalances are commonly observed in point cloud semantic segmentation datasets: (1) category imbalances, where certain objects are more prevalent than others; and (2) size imbalances, where certain objects occupy more points than others. Because of this, the majority of categories and large objects are favored in the existing evaluation metrics. This paper suggests fine-grained mIoU and mAcc for a more thorough assessment of point cloud segmentation algorithms in order to address these issues. Richer statistical information is provided for models and datasets by these fine-grained metrics, which also lessen the bias of current semantic segmentation metrics towards large objects. The proposed metrics are used to train and assess various semantic segmentation algorithms on three distinct indoor and outdoor semantic segmentation datasets.

* PRCV 2024

Via

Access Paper or Ask Questions

Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation

Mar 11, 2024

Peng Zhang, Ting Wu, Jinsheng Sun, Weiqing Li, Zhiyong Su

Figure 1 for Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation

Figure 2 for Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation

Figure 3 for Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation

Figure 4 for Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation

Abstract:Existing interactive point cloud segmentation approaches primarily focus on the object segmentation, which aim to determine which points belong to the object of interest guided by user interactions. This paper concentrates on an unexplored yet meaningful task, i.e., interactive point cloud semantic segmentation, which assigns high-quality semantic labels to all points in a scene with user corrective clicks. Concretely, we presents the first interactive framework for point cloud semantic segmentation, named InterPCSeg, which seamlessly integrates with off-the-shelf semantic segmentation networks without offline re-training, enabling it to run in an on-the-fly manner. To achieve online refinement, we treat user interactions as sparse training examples during the test-time. To address the instability caused by the sparse supervision, we design a stabilization energy to regulate the test-time training process. For objective and reproducible evaluation, we develop an interaction simulation scheme tailored for the interactive point cloud semantic segmentation task. We evaluate our framework on the S3DIS and ScanNet datasets with off-the-shelf segmentation networks, incorporating interactions from both the proposed interaction simulator and real users. Quantitative and qualitative experimental results demonstrate the efficacy of our framework in refining the semantic segmentation results with user interactions. The source code will be publicly available.

Via

Access Paper or Ask Questions

Hypergraph Convolutional Network based Weakly Supervised Point Cloud Semantic Segmentation with Scene-Level Annotations

Nov 02, 2022

Zhuheng Lu, Peng Zhang, Yuewei Dai, Weiqing Li, Zhiyong Su

Abstract:Point cloud segmentation with scene-level annotations is a promising but challenging task. Currently, the most popular way is to employ the class activation map (CAM) to locate discriminative regions and then generate point-level pseudo labels from scene-level annotations. However, these methods always suffer from the point imbalance among categories, as well as the sparse and incomplete supervision from CAM. In this paper, we propose a novel weighted hypergraph convolutional network-based method, called WHCN, to confront the challenges of learning point-wise labels from scene-level annotations. Firstly, in order to simultaneously overcome the point imbalance among different categories and reduce the model complexity, superpoints of a training point cloud are generated by exploiting the geometrically homogeneous partition. Then, a hypergraph is constructed based on the high-confidence superpoint-level seeds which are converted from scene-level annotations. Secondly, the WHCN takes the hypergraph as input and learns to predict high-precision point-level pseudo labels by label propagation. Besides the backbone network consisting of spectral hypergraph convolution blocks, a hyperedge attention module is learned to adjust the weights of hyperedges in the WHCN. Finally, a segmentation network is trained by these pseudo point cloud labels. We comprehensively conduct experiments on the ScanNet and S3DIS segmentation datasets. Experimental results demonstrate that the proposed WHCN is effective to predict the point labels with scene annotations, and yields state-of-the-art results in the community. The source code is available at http://zhiyongsu.github.io/Project/WHCN.html.

Via

Access Paper or Ask Questions

Joint Data and Feature Augmentation for Self-Supervised Representation Learning on Point Clouds

Nov 02, 2022

Zhuheng Lu, Yuewei Dai, Weiqing Li, Zhiyong Su

Figure 1 for Joint Data and Feature Augmentation for Self-Supervised Representation Learning on Point Clouds

Figure 2 for Joint Data and Feature Augmentation for Self-Supervised Representation Learning on Point Clouds

Figure 3 for Joint Data and Feature Augmentation for Self-Supervised Representation Learning on Point Clouds

Figure 4 for Joint Data and Feature Augmentation for Self-Supervised Representation Learning on Point Clouds

Abstract:To deal with the exhausting annotations, self-supervised representation learning from unlabeled point clouds has drawn much attention, especially centered on augmentation-based contrastive methods. However, specific augmentations hardly produce sufficient transferability to high-level tasks on different datasets. Besides, augmentations on point clouds may also change underlying semantics. To address the issues, we propose a simple but efficient augmentation fusion contrastive learning framework to combine data augmentations in Euclidean space and feature augmentations in feature space. In particular, we propose a data augmentation method based on sampling and graph generation. Meanwhile, we design a data augmentation network to enable a correspondence of representations by maximizing consistency between augmented graph pairs. We further design a feature augmentation network that encourages the model to learn representations invariant to the perturbations using an encoder perturbation. We comprehensively conduct extensive object classification experiments and object part segmentation experiments to validate the transferability of the proposed framework. Experimental results demonstrate that the proposed framework is effective to learn the point cloud representation in a self-supervised manner, and yields state-of-the-art results in the community. The source code is publicly available at: https://zhiyongsu.github.io/Project/AFSRL.html.

Via

Access Paper or Ask Questions

AU-PD: An Arbitrary-size and Uniform Downsampling Framework for Point Clouds

Nov 02, 2022

Peng Zhang, Ruoyin Xie, Jinsheng Sun, Weiqing Li, Zhiyong Su

Figure 1 for AU-PD: An Arbitrary-size and Uniform Downsampling Framework for Point Clouds

Figure 2 for AU-PD: An Arbitrary-size and Uniform Downsampling Framework for Point Clouds

Figure 3 for AU-PD: An Arbitrary-size and Uniform Downsampling Framework for Point Clouds

Figure 4 for AU-PD: An Arbitrary-size and Uniform Downsampling Framework for Point Clouds

Abstract:Point cloud downsampling is a crucial pre-processing operation to downsample the points in the point cloud in order to reduce computational cost, and communication load, to name a few. Recent research on point cloud downsampling has achieved great success which concentrates on learning to sample in a task-aware way. However, existing learnable samplers can not perform arbitrary-size sampling directly. Moreover, their sampled results always comprise many overlapping points. In this paper, we introduce the AU-PD, a novel task-aware sampling framework that directly downsamples point cloud to any smaller size based on a sample-to-refine strategy. Given a specified arbitrary size, we first perform task-agnostic pre-sampling to sample the input point cloud. Then, we refine the pre-sampled set to make it task-aware, driven by downstream task losses. The refinement is realized by adding each pre-sampled point with a small offset predicted by point-wise multi-layer perceptrons (MLPs). In this way, the sampled set remains almost unchanged from the original in distribution, and therefore contains fewer overlapping cases. With the attention mechanism and proper training scheme, the framework learns to adaptively refine the pre-sampled set of different sizes. We evaluate sampled results for classification and registration tasks, respectively. The proposed AU-PD gets competitive downstream performance with the state-of-the-art method while being more flexible and containing fewer overlapping points in the sampled set. The source code will be publicly available at https://zhiyongsu.github.io/Project/AUPD.html.

Via

Access Paper or Ask Questions