Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kang You

Efficient LiDAR Reflectance Compression via Scanning Serialization

May 14, 2025

Jiahao Zhu, Kang You, Dandan Ding, Zhan Ma

Abstract:Reflectance attributes in LiDAR point clouds provide essential information for downstream tasks but remain underexplored in neural compression methods. To address this, we introduce SerLiC, a serialization-based neural compression framework to fully exploit the intrinsic characteristics of LiDAR reflectance. SerLiC first transforms 3D LiDAR point clouds into 1D sequences via scan-order serialization, offering a device-centric perspective for reflectance analysis. Each point is then tokenized into a contextual representation comprising its sensor scanning index, radial distance, and prior reflectance, for effective dependencies exploration. For efficient sequential modeling, Mamba is incorporated with a dual parallelization scheme, enabling simultaneous autoregressive dependency capture and fast processing. Extensive experiments demonstrate that SerLiC attains over 2x volume reduction against the original reflectance data, outperforming the state-of-the-art method by up to 22% reduction of compressed bits while using only 2% of its parameters. Moreover, a lightweight version of SerLiC achieves > 10 fps (frames per second) with just 111K parameters, which is attractive for real-world applications.

Via

Access Paper or Ask Questions

RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds

Mar 16, 2025

Kang You, Tong Chen, Dandan Ding, M. Salman Asif, Zhan Ma

Abstract:Despite the substantial advancements demonstrated by learning-based neural models in the LiDAR Point Cloud Compression (LPCC) task, realizing real-time compression - an indispensable criterion for numerous industrial applications - remains a formidable challenge. This paper proposes RENO, the first real-time neural codec for 3D LiDAR point clouds, achieving superior performance with a lightweight model. RENO skips the octree construction and directly builds upon the multiscale sparse tensor representation. Instead of the multi-stage inferring, RENO devises sparse occupancy codes, which exploit cross-scale correlation and derive voxels' occupancy in a one-shot manner, greatly saving processing time. Experimental results demonstrate that the proposed RENO achieves real-time coding speed, 10 fps at 14-bit depth on a desktop platform (e.g., one RTX 3090 GPU) for both encoding and decoding processes, while providing 12.25% and 48.34% bit-rate savings compared to G-PCCv23 and Draco, respectively, at a similar quality. RENO model size is merely 1MB, making it attractive for practical applications. The source code is available at https://github.com/NJUVISION/RENO.

Via

Access Paper or Ask Questions

Att2CPC: Attention-Guided Lossy Attribute Compression of Point Clouds

Oct 23, 2024

Kai Liu, Kang You, Pan Gao, Manoranjan Paul

Abstract:With the great progress of 3D sensing and acquisition technology, the volume of point cloud data has grown dramatically, which urges the development of efficient point cloud compression methods. In this paper, we focus on the task of learned lossy point cloud attribute compression (PCAC). We propose an efficient attention-based method for lossy compression of point cloud attributes leveraging on an autoencoder architecture. Specifically, at the encoding side, we conduct multiple downsampling to best exploit the local attribute patterns, in which effective External Cross Attention (ECA) is devised to hierarchically aggregate features by intergrating attributes and geometry contexts. At the decoding side, the attributes of the point cloud are progressively reconstructed based on the multi-scale representation and the zero-padding upsampling tactic. To the best of our knowledge, this is the first approach to introduce attention mechanism to point-based lossy PCAC task. We verify the compression efficiency of our model on various sequences, including human body frames, sparse objects, and large-scale point cloud scenes. Experiments show that our method achieves an average improvement of 1.15 dB and 2.13 dB in BD-PSNR of Y channel and YUV channel, respectively, when comparing with the state-of-the-art point-based method Deep-PCAC. Codes of this paper are available at https://github.com/I2-Multimedia-Lab/Att2CPC.

Via

Access Paper or Ask Questions

Diff-PCC: Diffusion-based Neural Compression for 3D Point Clouds

Aug 20, 2024

Kai Liu, Kang You, Pan Gao

Figure 1 for Diff-PCC: Diffusion-based Neural Compression for 3D Point Clouds

Figure 2 for Diff-PCC: Diffusion-based Neural Compression for 3D Point Clouds

Figure 3 for Diff-PCC: Diffusion-based Neural Compression for 3D Point Clouds

Figure 4 for Diff-PCC: Diffusion-based Neural Compression for 3D Point Clouds

Abstract:Stable diffusion networks have emerged as a groundbreaking development for their ability to produce realistic and detailed visual content. This characteristic renders them ideal decoders, capable of producing high-quality and aesthetically pleasing reconstructions. In this paper, we introduce the first diffusion-based point cloud compression method, dubbed Diff-PCC, to leverage the expressive power of the diffusion model for generative and aesthetically superior decoding. Different from the conventional autoencoder fashion, a dual-space latent representation is devised in this paper, in which a compressor composed of two independent encoding backbones is considered to extract expressive shape latents from distinct latent spaces. At the decoding side, a diffusion-based generator is devised to produce high-quality reconstructions by considering the shape latents as guidance to stochastically denoise the noisy point clouds. Experiments demonstrate that the proposed Diff-PCC achieves state-of-the-art compression performance (e.g., 7.711 dB BD-PSNR gains against the latest G-PCC standard at ultra-low bitrate) while attaining superior subjective quality. Source code will be made publicly available.

Via

Access Paper or Ask Questions

Obtaining Optimal Spiking Neural Network in Sequence Learning via CRNN-SNN Conversion

Aug 18, 2024

Jiahao Su, Kang You, Zekai Xu, Weizhi Xu, Zhezhi He

Figure 1 for Obtaining Optimal Spiking Neural Network in Sequence Learning via CRNN-SNN Conversion

Figure 2 for Obtaining Optimal Spiking Neural Network in Sequence Learning via CRNN-SNN Conversion

Figure 3 for Obtaining Optimal Spiking Neural Network in Sequence Learning via CRNN-SNN Conversion

Figure 4 for Obtaining Optimal Spiking Neural Network in Sequence Learning via CRNN-SNN Conversion

Abstract:Spiking neural networks (SNNs) are becoming a promising alternative to conventional artificial neural networks (ANNs) due to their rich neural dynamics and the implementation of energy-efficient neuromorphic chips. However, the non-differential binary communication mechanism makes SNN hard to converge to an ANN-level accuracy. When SNN encounters sequence learning, the situation becomes worse due to the difficulties in modeling long-range dependencies. To overcome these difficulties, researchers developed variants of LIF neurons and different surrogate gradients but still failed to obtain good results when the sequence became longer (e.g., $>$500). Unlike them, we obtain an optimal SNN in sequence learning by directly mapping parameters from a quantized CRNN. We design two sub-pipelines to support the end-to-end conversion of different structures in neural networks, which is called CNN-Morph (CNN $\rightarrow$ QCNN $\rightarrow$ BIFSNN) and RNN-Morph (RNN $\rightarrow$ QRNN $\rightarrow$ RBIFSNN). Using conversion pipelines and the s-analog encoding method, the conversion error of our framework is zero. Furthermore, we give the theoretical and experimental demonstration of the lossless CRNN-SNN conversion. Our results show the effectiveness of our method over short and long timescales tasks compared with the state-of-the-art learning- and conversion-based methods. We reach the highest accuracy of 99.16% (0.46 $\uparrow$) on S-MNIST, 94.95% (3.95 $\uparrow$) on PS-MNIST (sequence length of 784) respectively, and the lowest loss of 0.057 (0.013 $\downarrow$) within 8 time-steps in collision avoidance dataset.

* Accepted by 33rd International Conference on Artificial Neural Networks

Via

Access Paper or Ask Questions

Global Attention-Guided Dual-Domain Point Cloud Feature Learning for Classification and Segmentation

Jul 12, 2024

Zihao Li, Pan Gao, Kang You, Chuan Yan, Manoranjan Paul

Figure 1 for Global Attention-Guided Dual-Domain Point Cloud Feature Learning for Classification and Segmentation

Figure 2 for Global Attention-Guided Dual-Domain Point Cloud Feature Learning for Classification and Segmentation

Figure 3 for Global Attention-Guided Dual-Domain Point Cloud Feature Learning for Classification and Segmentation

Figure 4 for Global Attention-Guided Dual-Domain Point Cloud Feature Learning for Classification and Segmentation

Abstract:Previous studies have demonstrated the effectiveness of point-based neural models on the point cloud analysis task. However, there remains a crucial issue on producing the efficient input embedding for raw point coordinates. Moreover, another issue lies in the limited efficiency of neighboring aggregations, which is a critical component in the network stem. In this paper, we propose a Global Attention-guided Dual-domain Feature Learning network (GAD) to address the above-mentioned issues. We first devise the Contextual Position-enhanced Transformer (CPT) module, which is armed with an improved global attention mechanism, to produce a global-aware input embedding that serves as the guidance to subsequent aggregations. Then, the Dual-domain K-nearest neighbor Feature Fusion (DKFF) is cascaded to conduct effective feature aggregation through novel dual-domain feature learning which appreciates both local geometric relations and long-distance semantic connections. Extensive experiments on multiple point cloud analysis tasks (e.g., classification, part segmentation, and scene semantic segmentation) demonstrate the superior performance of the proposed method and the efficacy of the devised modules.

Via

Access Paper or Ask Questions

BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation

Jul 12, 2024

Zekai Xu, Kang You, Qinghai Guo, Xiang Wang, Zhezhi He

Abstract:Spiking neural networks (SNNs), which mimic biological neural system to convey information via discrete spikes, are well known as brain-inspired models with excellent computing efficiency. By utilizing the surrogate gradient estimation for discrete spikes, learning-based SNN training methods that can achieve ultra-low inference latency (number of time-step) emerge recently. Nevertheless, due to the difficulty in deriving precise gradient estimation for discrete spikes using learning-based method, a distinct accuracy gap persists between SNN and its artificial neural networks (ANNs) counterpart. To address the aforementioned issue, we propose a blurred knowledge distillation (BKD) technique, which leverages random blurred SNN feature to restore and imitate the ANN feature. Note that, our BKD is applied upon the feature map right before the last layer of SNN, which can also mix with prior logits-based knowledge distillation for maximized accuracy boost. To our best knowledge, in the category of learning-based methods, our work achieves state-of-the-art performance for training SNNs on both static and neuromorphic datasets. On ImageNet dataset, BKDSNN outperforms prior best results by 4.51% and 0.93% with the network topology of CNN and Transformer respectively.

Via

Access Paper or Ask Questions

SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN

Jun 05, 2024

Kang You, Zekai Xu, Chen Nie, Zhijie Deng, Qinghai Guo, Xiang Wang, Zhezhi He

Figure 1 for SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN

Figure 2 for SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN

Figure 3 for SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN

Figure 4 for SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN

Abstract:Spiking neural network (SNN) has attracted great attention due to its characteristic of high efficiency and accuracy. Currently, the ANN-to-SNN conversion methods can obtain ANN on-par accuracy SNN with ultra-low latency (8 time-steps) in CNN structure on computer vision (CV) tasks. However, as Transformer-based networks have achieved prevailing precision on both CV and natural language processing (NLP), the Transformer-based SNNs are still encounting the lower accuracy w.r.t the ANN counterparts. In this work, we introduce a novel ANN-to-SNN conversion method called SpikeZIP-TF, where ANN and SNN are exactly equivalent, thus incurring no accuracy degradation. SpikeZIP-TF achieves 83.82% accuracy on CV dataset (ImageNet) and 93.79% accuracy on NLP dataset (SST-2), which are higher than SOTA Transformer-based SNNs. The code is available in GitHub: https://github.com/Intelligent-Computing-Research-Group/SpikeZIP_transformer

* * These authors contributed equally to this work

Via

Access Paper or Ask Questions

Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes

Apr 21, 2024

Kang You, Kai Liu, Li Yu, Pan Gao, Dandan Ding

Figure 1 for Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes

Figure 2 for Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes

Figure 3 for Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes

Figure 4 for Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes

Abstract:Despite considerable progress being achieved in point cloud geometry compression, there still remains a challenge in effectively compressing large-scale scenes with sparse surfaces. Another key challenge lies in reducing decoding latency, a crucial requirement in real-world application. In this paper, we propose Pointsoup, an efficient learning-based geometry codec that attains high-performance and extremely low-decoding-latency simultaneously. Inspired by conventional Trisoup codec, a point model-based strategy is devised to characterize local surfaces. Specifically, skin features are embedded from local windows via an attention-based encoder, and dilated windows are introduced as cross-scale priors to infer the distribution of quantized features in parallel. During decoding, features undergo fast refinement, followed by a folding-based point generator that reconstructs point coordinates with fairly fast speed. Experiments show that Pointsoup achieves state-of-the-art performance on multiple benchmarks with significantly lower decoding complexity, i.e., up to 90$\sim$160$\times$ faster than the G-PCCv23 Trisoup decoder on a comparatively low-end platform (e.g., one RTX 2080Ti). Furthermore, it offers variable-rate control with a single neural model (2.9MB), which is attractive for industrial practitioners.

Via

Access Paper or Ask Questions

Efficient and Generic Point Model for Lossless Point Cloud Attribute Compression

Apr 10, 2024

Kang You, Pan Gao, Zhan Ma

Abstract:The past several years have witnessed the emergence of learned point cloud compression (PCC) techniques. However, current learning-based lossless point cloud attribute compression (PCAC) methods either suffer from high computational complexity or deteriorated compression performance. Moreover, the significant variations in point cloud scale and sparsity encountered in real-world applications make developing an all-in-one neural model a challenging task. In this paper, we propose PoLoPCAC, an efficient and generic lossless PCAC method that achieves high compression efficiency and strong generalizability simultaneously. We formulate lossless PCAC as the task of inferring explicit distributions of attributes from group-wise autoregressive priors. A progressive random grouping strategy is first devised to efficiently resolve the point cloud into groups, and then the attributes of each group are modeled sequentially from accumulated antecedents. A locality-aware attention mechanism is utilized to exploit prior knowledge from context windows in parallel. Since our method directly operates on points, it can naturally avoids distortion caused by voxelization, and can be executed on point clouds with arbitrary scale and density. Experiments show that our method can be instantly deployed once trained on a Synthetic 2k-ShapeNet dataset while enjoying continuous bit-rate reduction over the latest G-PCCv23 on various datasets (ShapeNet, ScanNet, MVUB, 8iVFB). Meanwhile, our method reports shorter coding time than G-PCCv23 on the majority of sequences with a lightweight model size (2.6MB), which is highly attractive for practical applications. Dataset, code and trained model are available at https://github.com/I2-Multimedia-Lab/PoLoPCAC.

Via

Access Paper or Ask Questions