Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xianzhi Li

Towards Pixel-Wise Anomaly Location for High-Resolution PCBA via Self-Supervised Image Reconstruction

Dec 22, 2025

Wuyi Liu, Le Jin, Junxian Yang, Yuanchao Yu, Zishuo Peng, Jinfeng Xu, Xianzhi Li, Jun Zhou

Abstract:Automated defect inspection of assembled Printed Circuit Board Assemblies (PCBA) is quite challenging due to the insufficient labeled data, micro-defects with just a few pixels in visually-complex and high-resolution images. To address these challenges, we present HiSIR-Net, a High resolution, Self-supervised Reconstruction framework for pixel-wise PCBA localization. Our design combines two lightweight modules that make this practical on real 4K-resolution boards: (i) a Selective Input-Reconstruction Gate (SIR-Gate) that lets the model decide where to trust reconstruction versus the original input, thereby reducing irrelevant reconstruction artifacts and false alarms; and (ii) a Region-level Optimized Patch Selection (ROPS) scheme with positional cues to select overlapping patch reconstructions coherently across arbitrary resolutions. Organically integrating these mechanisms yields clean, high-resolution anomaly maps with low false positive (FP) rate. To bridge the gap in high-resolution PCBA datasets, we further contribute a self-collected dataset named SIPCBA-500 of 500 images. We conduct extensive experiments on our SIPCBA-500 as well as public benchmarks, demonstrating the superior localization performance of our method while running at practical speed. Full code and dataset will be made available upon acceptance.

Via

Access Paper or Ask Questions

SASep: Saliency-Aware Structured Separation of Geometry and Feature for Open Set Learning on Point Clouds

Jun 16, 2025

Jinfeng Xu, Xianzhi Li, Yuan Tang, Xu Han, Qiao Yu, Yixue Hao, Long Hu, Min Chen

Abstract:Recent advancements in deep learning have greatly enhanced 3D object recognition, but most models are limited to closed-set scenarios, unable to handle unknown samples in real-world applications. Open-set recognition (OSR) addresses this limitation by enabling models to both classify known classes and identify novel classes. However, current OSR methods rely on global features to differentiate known and unknown classes, treating the entire object uniformly and overlooking the varying semantic importance of its different parts. To address this gap, we propose Salience-Aware Structured Separation (SASep), which includes (i) a tunable semantic decomposition (TSD) module to semantically decompose objects into important and unimportant parts, (ii) a geometric synthesis strategy (GSS) to generate pseudo-unknown objects by combining these unimportant parts, and (iii) a synth-aided margin separation (SMS) module to enhance feature-level separation by expanding the feature distributions between classes. Together, these components improve both geometric and feature representations, enhancing the model's ability to effectively distinguish known and unknown classes. Experimental results show that SASep achieves superior performance in 3D OSR, outperforming existing state-of-the-art methods.

* 10 pages, conference

Via

Access Paper or Ask Questions

Towards Robust Dialogue Breakdown Detection: Addressing Disruptors in Large Language Models with Self-Guided Reasoning

Apr 26, 2025

Abdellah Ghassel, Xianzhi Li, Xiaodan Zhu

Figure 1 for Towards Robust Dialogue Breakdown Detection: Addressing Disruptors in Large Language Models with Self-Guided Reasoning

Figure 2 for Towards Robust Dialogue Breakdown Detection: Addressing Disruptors in Large Language Models with Self-Guided Reasoning

Figure 3 for Towards Robust Dialogue Breakdown Detection: Addressing Disruptors in Large Language Models with Self-Guided Reasoning

Figure 4 for Towards Robust Dialogue Breakdown Detection: Addressing Disruptors in Large Language Models with Self-Guided Reasoning

Abstract:Large language models (LLMs) are rapidly changing various domains. However, their capabilities in handling conversational breakdowns still require an in-depth exploration. This paper addresses the challenge of detecting and mitigating dialogue breakdowns within LLM-driven conversational systems. While powerful models from OpenAI and Anthropic excel in many dialogue tasks, they can still produce incoherent or contradictory responses, commonly referred to as breakdowns, which undermine user trust. To tackle this, we propose an approach that combines specialized fine-tuning with advanced prompting strategies, including few-shot learning, chain-of-thought reasoning, and analogical prompting. In particular, we fine-tune a small 8B model and demonstrate its robust classification and calibration capabilities in English and Japanese dialogue. We also validate its generalization on the BETOLD dataset, achieving a 7\% accuracy improvement over its base model. Furthermore, we introduce a real-time deployment architecture that selectively escalates suspicious responses to more resource-intensive frontier models only when breakdowns are detected, significantly cutting operational expenses and energy consumption. Experimental results show our method surpasses prior state-of-the-art specialized classifiers while also narrowing performance gaps between smaller open-source models and large proprietary ones. Our approach offers a scalable solution for robust conversational AI in high-impact domains by combining efficiency, interpretability, and reliability.

Via

Access Paper or Ask Questions

Entropy-Aware Branching for Improved Mathematical Reasoning

Mar 27, 2025

Xianzhi Li, Ethan Callanan, Xiaodan Zhu, Mathieu Sibue, Antony Papadimitriou, Mahmoud Mahfouz, Zhiqiang Ma, Xiaomo Liu

Abstract:While Large Language Models (LLMs) are effectively aligned through extensive pre-training and fine-tuning, they still struggle with varying levels of uncertainty during token generation. In our investigation of mathematical reasoning, we observe that errors are more likely to arise at tokens exhibiting high entropy and variance of entropy in the model's output distribution. Based on the observation, we propose a novel approach that dynamically branches the generation process on demand instead of defaulting to the single most probable token. By exploring in parallel multiple branches stemming from high probability tokens of critical decision points, the model can discover diverse reasoning paths that might otherwise be missed. We further harness external feedback from larger models to rank and select the most coherent and accurate reasoning branch. Our experimental results on mathematical word problems and calculation questions show that this branching strategy boosts the reasoning capabilities of small LLMs up to 4.6% compared to conventional argmax decoding.

Via

Access Paper or Ask Questions

Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation

Nov 25, 2024

Qiao Yu, Xianzhi Li, Yuan Tang, Xu Han, Long Hu, Yixue Hao, Min Chen

Figure 1 for Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation

Figure 2 for Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation

Figure 3 for Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation

Figure 4 for Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation

Abstract:Generating 3D meshes from a single image is an important but ill-posed task. Existing methods mainly adopt 2D multiview diffusion models to generate intermediate multiview images, and use the Large Reconstruction Model (LRM) to create the final meshes. However, the multiview images exhibit local inconsistencies, and the meshes often lack fidelity to the input image or look blurry. We propose Fancy123, featuring two enhancement modules and an unprojection operation to address the above three issues, respectively. The appearance enhancement module deforms the 2D multiview images to realign misaligned pixels for better multiview consistency. The fidelity enhancement module deforms the 3D mesh to match the input image. The unprojection of the input image and deformed multiview images onto LRM's generated mesh ensures high clarity, discarding LRM's predicted blurry-looking mesh colors. Extensive qualitative and quantitative experiments verify Fancy123's SoTA performance with significant improvement. Also, the two enhancement modules are plug-and-play and work at inference time, allowing seamless integration into various existing single-image-to-3D methods.

* Project page: https://github.com/YuQiao0303/Fancy123

Via

Access Paper or Ask Questions

Fine-Tuning Language Models with Differential Privacy through Adaptive Noise Allocation

Oct 03, 2024

Xianzhi Li, Ran Zmigrod, Zhiqiang Ma, Xiaomo Liu, Xiaodan Zhu

Figure 1 for Fine-Tuning Language Models with Differential Privacy through Adaptive Noise Allocation

Figure 2 for Fine-Tuning Language Models with Differential Privacy through Adaptive Noise Allocation

Figure 3 for Fine-Tuning Language Models with Differential Privacy through Adaptive Noise Allocation

Figure 4 for Fine-Tuning Language Models with Differential Privacy through Adaptive Noise Allocation

Abstract:Language models are capable of memorizing detailed patterns and information, leading to a double-edged effect: they achieve impressive modeling performance on downstream tasks with the stored knowledge but also raise significant privacy concerns. Traditional differential privacy based training approaches offer robust safeguards by employing a uniform noise distribution across all parameters. However, this overlooks the distinct sensitivities and contributions of individual parameters in privacy protection and often results in suboptimal models. To address these limitations, we propose ANADP, a novel algorithm that adaptively allocates additive noise based on the importance of model parameters. We demonstrate that ANADP narrows the performance gap between regular fine-tuning and traditional DP fine-tuning on a series of datasets while maintaining the required privacy constraints.

* EMNLP 2024 findings

Via

Access Paper or Ask Questions

RIDE: Boosting 3D Object Detection for LiDAR Point Clouds via Rotation-Invariant Analysis

Aug 29, 2024

Zhaoxuan Wang, Xu Han, Hongxin Liu, Xianzhi Li

Figure 1 for RIDE: Boosting 3D Object Detection for LiDAR Point Clouds via Rotation-Invariant Analysis

Figure 2 for RIDE: Boosting 3D Object Detection for LiDAR Point Clouds via Rotation-Invariant Analysis

Figure 3 for RIDE: Boosting 3D Object Detection for LiDAR Point Clouds via Rotation-Invariant Analysis

Figure 4 for RIDE: Boosting 3D Object Detection for LiDAR Point Clouds via Rotation-Invariant Analysis

Abstract:The rotation robustness property has drawn much attention to point cloud analysis, whereas it still poses a critical challenge in 3D object detection. When subjected to arbitrary rotation, most existing detectors fail to produce expected outputs due to the poor rotation robustness. In this paper, we present RIDE, a pioneering exploration of Rotation-Invariance for the 3D LiDAR-point-based object DEtector, with the key idea of designing rotation-invariant features from LiDAR scenes and then effectively incorporating them into existing 3D detectors. Specifically, we design a bi-feature extractor that extracts (i) object-aware features though sensitive to rotation but preserve geometry well, and (ii) rotation-invariant features, which lose geometric information to a certain extent but are robust to rotation. These two kinds of features complement each other to decode 3D proposals that are robust to arbitrary rotations. Particularly, our RIDE is compatible and easy to plug into the existing one-stage and two-stage 3D detectors, and boosts both detection performance and rotation robustness. Extensive experiments on the standard benchmarks showcase that the mean average precision (mAP) and rotation robustness can be significantly boosted by integrating with our RIDE, with +5.6% mAP and 53% rotation robustness improvement on KITTI, +5.1% and 28% improvement correspondingly on nuScenes. The code will be available soon.

Via

Access Paper or Ask Questions

More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding

Aug 28, 2024

Yuan Tang, Xu Han, Xianzhi Li, Qiao Yu, Jinfeng Xu, Yixue Hao, Long Hu, Min Chen

Figure 1 for More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding

Figure 2 for More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding

Figure 3 for More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding

Figure 4 for More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding

Abstract:Enabling Large Language Models (LLMs) to comprehend the 3D physical world remains a significant challenge. Due to the lack of large-scale 3D-text pair datasets, the success of LLMs has yet to be replicated in 3D understanding. In this paper, we rethink this issue and propose a new task: 3D Data-Efficient Point-Language Understanding. The goal is to enable LLMs to achieve robust 3D object understanding with minimal 3D point cloud and text data pairs. To address this task, we introduce GreenPLM, which leverages more text data to compensate for the lack of 3D data. First, inspired by using CLIP to align images and text, we utilize a pre-trained point cloud-text encoder to map the 3D point cloud space to the text space. This mapping leaves us to seamlessly connect the text space with LLMs. Once the point-text-LLM connection is established, we further enhance text-LLM alignment by expanding the intermediate text space, thereby reducing the reliance on 3D point cloud data. Specifically, we generate 6M free-text descriptions of 3D objects, and design a three-stage training strategy to help LLMs better explore the intrinsic connections between different modalities. To achieve efficient modality alignment, we design a zero-parameter cross-attention module for token pooling. Extensive experimental results show that GreenPLM requires only 12% of the 3D training data used by existing state-of-the-art models to achieve superior 3D understanding. Remarkably, GreenPLM also achieves competitive performance using text-only data. The code and weights are available at: https://github.com/TangYuan96/GreenPLM.

Via

Access Paper or Ask Questions

PointDreamer: Zero-shot 3D Textured Mesh Reconstruction from Colored Point Cloud by 2D Inpainting

Jun 22, 2024

Qiao Yu, Xianzhi Li, Yuan Tang, Jinfeng Xu, Long Hu, Yixue Hao, Min Chen

Abstract:Reconstructing textured meshes from colored point clouds is an important but challenging task in 3D graphics and vision. Most existing methods predict colors as implicit functions in 3D or UV space, suffering from blurry textures or the lack of generalization capability. Addressing this, we propose PointDreamer, a novel framework for textured mesh reconstruction from colored point cloud. It produces meshes with enhanced fidelity and clarity by 2D image inpainting, taking advantage of the mature techniques and massive data of 2D vision. Specifically, we first project the input point cloud into 2D space to generate sparse multi-view images, and then inpaint empty pixels utilizing a pre-trained 2D diffusion model. Next, we design a novel Non-Border-First strategy to unproject the colors of the inpainted dense images back to 3D space, thus obtaining the final textured mesh. In this way, our PointDreamer works in a zero-shot manner, requiring no extra training. Extensive qualitative and quantitative experiments on various synthetic and real-scanned datasets show the SoTA performance of PointDreamer, by significantly outperforming baseline methods with 30\% improvement in LPIPS score (from 0.118 to 0.068). Code at: https://github.com/YuQiao0303/PointDreamer.

Via

Access Paper or Ask Questions

MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

May 02, 2024

Yuan Tang, Xu Han, Xianzhi Li, Qiao Yu, Yixue Hao, Long Hu, Min Chen

Figure 1 for MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

Figure 2 for MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

Figure 3 for MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

Figure 4 for MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

Abstract:Large 2D vision-language models (2D-LLMs) have gained significant attention by bridging Large Language Models (LLMs) with images using a simple projector. Inspired by their success, large 3D point cloud-language models (3D-LLMs) also integrate point clouds into LLMs. However, directly aligning point clouds with LLM requires expensive training costs, typically in hundreds of GPU-hours on A100, which hinders the development of 3D-LLMs. In this paper, we introduce MiniGPT-3D, an efficient and powerful 3D-LLM that achieves multiple SOTA results while training for only 27 hours on one RTX 3090. Specifically, we propose to align 3D point clouds with LLMs using 2D priors from 2D-LLMs, which can leverage the similarity between 2D and 3D visual information. We introduce a novel four-stage training strategy for modality alignment in a cascaded way, and a mixture of query experts module to adaptively aggregate features with high efficiency. Moreover, we utilize parameter-efficient fine-tuning methods LoRA and Norm fine-tuning, resulting in only 47.8M learnable parameters, which is up to 260x fewer than existing methods. Extensive experiments show that MiniGPT-3D achieves SOTA on 3D object classification and captioning tasks, with significantly cheaper training costs. Notably, MiniGPT-3D gains an 8.12 increase on GPT-4 evaluation score for the challenging object captioning task compared to ShapeLLM-13B, while the latter costs 160 total GPU-hours on 8 A800. We are the first to explore the efficient 3D-LLM, offering new insights to the community. Code and weights are available at https://github.com/TangYuan96/MiniGPT-3D.

* 17 pages, 9 figures

Via

Access Paper or Ask Questions