Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shengyuan Zhang

Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion

Apr 16, 2025

An Zhao, Shengyuan Zhang, Ling Yang, Zejian Li, Jiale Wu, Haoran Xu, AnYang Wei, Perry Pengyun GU, Lingyun Sun

Abstract:The application of diffusion models in 3D LiDAR scene completion is limited due to diffusion's slow sampling speed. Score distillation accelerates diffusion sampling but with performance degradation, while post-training with direct policy optimization (DPO) boosts performance using preference data. This paper proposes Distillation-DPO, a novel diffusion distillation framework for LiDAR scene completion with preference aligment. First, the student model generates paired completion scenes with different initial noises. Second, using LiDAR scene evaluation metrics as preference, we construct winning and losing sample pairs. Such construction is reasonable, since most LiDAR scene metrics are informative but non-differentiable to be optimized directly. Third, Distillation-DPO optimizes the student model by exploiting the difference in score functions between the teacher and student models on the paired completion scenes. Such procedure is repeated until convergence. Extensive experiments demonstrate that, compared to state-of-the-art LiDAR scene completion diffusion models, Distillation-DPO achieves higher-quality scene completion while accelerating the completion speed by more than 5-fold. Our method is the first to explore adopting preference learning in distillation to the best of our knowledge and provide insights into preference-aligned distillation. Our code is public available on https://github.com/happyw1nd/DistillationDPO.

* Our code is public available on https://github.com/happyw1nd/DistillationDPO

Via

Access Paper or Ask Questions

Research and Design on Intelligent Recognition of Unordered Targets for Robots Based on Reinforcement Learning

Mar 10, 2025

Yiting Mao, Dajun Tao, Shengyuan Zhang, Tian Qi, Keqin Li

Abstract:In the field of robot target recognition research driven by artificial intelligence (AI), factors such as the disordered distribution of targets, the complexity of the environment, the massive scale of data, and noise interference have significantly restricted the improvement of target recognition accuracy. Against the backdrop of the continuous iteration and upgrading of current AI technologies, to meet the demand for accurate recognition of disordered targets by intelligent robots in complex and changeable scenarios, this study innovatively proposes an AI - based intelligent robot disordered target recognition method using reinforcement learning. This method processes the collected target images with the bilateral filtering algorithm, decomposing them into low - illumination images and reflection images. Subsequently, it adopts differentiated AI strategies, compressing the illumination images and enhancing the reflection images respectively, and then fuses the two parts of images to generate a new image. On this basis, this study deeply integrates deep learning, a core AI technology, with the reinforcement learning algorithm. The enhanced target images are input into a deep reinforcement learning model for training, ultimately enabling the AI - based intelligent robot to efficiently recognize disordered targets. Experimental results show that the proposed method can not only significantly improve the quality of target images but also enable the AI - based intelligent robot to complete the recognition task of disordered targets with higher efficiency and accuracy, demonstrating extremely high application value and broad development prospects in the field of AI robots.

Via

Access Paper or Ask Questions

LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations

Dec 11, 2024

Zejian Li, Chenye Meng, Yize Li, Ling Yang, Shengyuan Zhang, Jiarui Ma, Jiayi Li, Guang Yang, Changyuan Yang, Zhiyuan Yang(+2 more)

Figure 1 for LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations

Figure 2 for LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations

Figure 3 for LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations

Figure 4 for LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations

Abstract:Recent advances in text-to-image (T2I) generation have shown remarkable success in producing high-quality images from text. However, existing T2I models show decayed performance in compositional image generation involving multiple objects and intricate relationships. We attribute this problem to limitations in existing datasets of image-text pairs, which lack precise inter-object relationship annotations with prompts only. To address this problem, we construct LAION-SG, a large-scale dataset with high-quality structural annotations of scene graphs (SG), which precisely describe attributes and relationships of multiple objects, effectively representing the semantic structure in complex scenes. Based on LAION-SG, we train a new foundation model SDXL-SG to incorporate structural annotation information into the generation process. Extensive experiments show advanced models trained on our LAION-SG boast significant performance improvements in complex scene generation over models on existing datasets. We also introduce CompSG-Bench, a benchmark that evaluates models on compositional image generation, establishing a new standard for this domain.

Via

Access Paper or Ask Questions

Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion

Dec 04, 2024

Shengyuan Zhang, An Zhao, Ling Yang, Zejian Li, Chenye Meng, Haoran Xu, Tianrun Chen, AnYang Wei, Perry Pengyun GU, Lingyun Sun

Figure 1 for Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion

Figure 2 for Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion

Figure 3 for Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion

Figure 4 for Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion

Abstract:Diffusion models have been applied to 3D LiDAR scene completion due to their strong training stability and high completion quality. However, the slow sampling speed limits the practical application of diffusion-based scene completion models since autonomous vehicles require an efficient perception of surrounding environments. This paper proposes a novel distillation method tailored for 3D LiDAR scene completion models, dubbed $\textbf{ScoreLiDAR}$, which achieves efficient yet high-quality scene completion. ScoreLiDAR enables the distilled model to sample in significantly fewer steps after distillation. To improve completion quality, we also introduce a novel $\textbf{Structural Loss}$, which encourages the distilled model to capture the geometric structure of the 3D LiDAR scene. The loss contains a scene-wise term constraining the holistic structure and a point-wise term constraining the key landmark points and their relative configuration. Extensive experiments demonstrate that ScoreLiDAR significantly accelerates the completion time from 30.55 to 5.37 seconds per frame ($>$5$\times$) on SemanticKITTI and achieves superior performance compared to state-of-the-art 3D LiDAR scene completion models. Our code is publicly available at https://github.com/happyw1nd/ScoreLiDAR.

* https://github.com/happyw1nd/ScoreLiDAR

Via

Access Paper or Ask Questions

Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

Aug 28, 2024

Shengyuan Zhang, Ling Yang, Zejian Li, An Zhao, Chenye Meng, Changyuan Yang, Guang Yang, Zhiyuan Yang, Lingyun Sun

Figure 1 for Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

Figure 2 for Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

Figure 3 for Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

Figure 4 for Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

Abstract:Accelerating the sampling speed of diffusion models remains a significant challenge. Recent score distillation methods distill a heavy teacher model into an one-step student generator, which is optimized by calculating the difference between the two score functions on the samples generated by the student model. However, there is a score mismatch issue in the early stage of the distillation process, because existing methods mainly focus on using the endpoint of pre-trained diffusion models as teacher models, overlooking the importance of the convergence trajectory between the student generator and the teacher model. To address this issue, we extend the score distillation process by introducing the entire convergence trajectory of teacher models and propose Distribution Backtracking Distillation (DisBack) for distilling student generators. DisBask is composed of two stages: Degradation Recording and Distribution Backtracking. Degradation Recording is designed to obtain the convergence trajectory of teacher models, which records the degradation path from the trained teacher model to the untrained initial student generator. The degradation path implicitly represents the intermediate distributions of teacher models. Then Distribution Backtracking trains a student generator to backtrack the intermediate distributions for approximating the convergence trajectory of teacher models. Extensive experiments show that DisBack achieves faster and better convergence than the existing distillation method and accomplishes comparable generation performance. Notably, DisBack is easy to implement and can be generalized to existing distillation methods to boost performance. Our code is publicly available on https://github.com/SYZhang0805/DisBack.

Via

Access Paper or Ask Questions

Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models

Nov 07, 2023

Shengzhe Zhou, Zejian Lee, Shengyuan Zhang, Lefan Hou, Changyuan Yang, Guang Yang, Lingyun Sun

Abstract:Denoising Diffusion models have exhibited remarkable capabilities in image generation. However, generating high-quality samples requires a large number of iterations. Knowledge distillation for diffusion models is an effective method to address this limitation with a shortened sampling process but causes degraded generative quality. Based on our analysis with bias-variance decomposition and experimental observations, we attribute the degradation to the spatial fitting error occurring in the training of both the teacher and student model. Accordingly, we propose $\textbf{S}$patial $\textbf{F}$itting-$\textbf{E}$rror $\textbf{R}$eduction $\textbf{D}$istillation model ($\textbf{SFERD}$). SFERD utilizes attention guidance from the teacher model and a designed semantic gradient predictor to reduce the student's fitting error. Empirically, our proposed model facilitates high-quality sample generation in a few function evaluations. We achieve an FID of 5.31 on CIFAR-10 and 9.39 on ImageNet 64$\times$64 with only one step, outperforming existing diffusion methods. Our study provides a new perspective on diffusion distillation by highlighting the intrinsic denoising ability of models.

Via

Access Paper or Ask Questions

Exploring the Correlation Between Ultrasound Speed and the State of Health of LiFePO$_4$ Prismatic Cells

Sep 24, 2023

Shengyuan Zhang, Peng Zuo, Xuesong Yin, Zheng Fan

Figure 1 for Exploring the Correlation Between Ultrasound Speed and the State of Health of LiFePO$_4$ Prismatic Cells

Figure 2 for Exploring the Correlation Between Ultrasound Speed and the State of Health of LiFePO$_4$ Prismatic Cells

Figure 3 for Exploring the Correlation Between Ultrasound Speed and the State of Health of LiFePO$_4$ Prismatic Cells

Figure 4 for Exploring the Correlation Between Ultrasound Speed and the State of Health of LiFePO$_4$ Prismatic Cells

Abstract:Electric vehicles (EVs) have become a popular mode of transportation, with their performance depending on the ageing of the Li-ion batteries used to power them. However, it can be challenging and time-consuming to determine the capacity retention of a battery in service. A rapid and reliable testing method for state of health (SoH) determination is desired. Ultrasonic testing techniques are promising due to their efficient, portable, and non-destructive features. In this study, we demonstrate that ultrasonic speed decreases with the degradation of the capacity of an LFP prismatic cell. We explain this correlation through numerical simulation, which describes wave propagation in porous media. We propose that the reduction of binder stiffness can be a primary cause of the change in ultrasonic speed during battery ageing. This work brings new insights into ultrasonic SoH estimation techniques.

Via

Access Paper or Ask Questions