Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhaoning Zhang

National University of Defense Technology, Changsha, China

GS-I$^{3}$: Gaussian Splatting for Surface Reconstruction from Illumination-Inconsistent Images

Mar 18, 2025

Tengfei Wang, Yongmao Hou, Zhaoning Zhang, Yiwei Xu, Zongqian Zhan, Xin Wang

Abstract:Accurate geometric surface reconstruction, providing essential environmental information for navigation and manipulation tasks, is critical for enabling robotic self-exploration and interaction. Recently, 3D Gaussian Splatting (3DGS) has gained significant attention in the field of surface reconstruction due to its impressive geometric quality and computational efficiency. While recent relevant advancements in novel view synthesis under inconsistent illumination using 3DGS have shown promise, the challenge of robust surface reconstruction under such conditions is still being explored. To address this challenge, we propose a method called GS-3I. Specifically, to mitigate 3D Gaussian optimization bias caused by underexposed regions in single-view images, based on Convolutional Neural Network (CNN), a tone mapping correction framework is introduced. Furthermore, inconsistent lighting across multi-view images, resulting from variations in camera settings and complex scene illumination, often leads to geometric constraint mismatches and deviations in the reconstructed surface. To overcome this, we propose a normal compensation mechanism that integrates reference normals extracted from single-view image with normals computed from multi-view observations to effectively constrain geometric inconsistencies. Extensive experimental evaluations demonstrate that GS-3I can achieve robust and accurate surface reconstruction across complex illumination scenarios, highlighting its effectiveness and versatility in this critical challenge. https://github.com/TFwang-9527/GS-3I

* Comments: This work has been submitted to the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025) for possible publication

Via

Access Paper or Ask Questions

GS-3I: Gaussian Splatting for Surface Reconstruction from Illumination-Inconsistent Images

Mar 16, 2025

Tengfei Wang, Yongmao Hou, Zhaoning Zhang, Yiwei Xu, Zongqian Zhan, Xin Wang

* This paper has been submitted to IROS 2025

Via

Access Paper or Ask Questions

Balance Divergence for Knowledge Distillation

Jan 14, 2025

Yafei Qi, Chen Wang, Zhaoning Zhang, Yaping Liu, Yongmin Zhang

Figure 1 for Balance Divergence for Knowledge Distillation

Figure 2 for Balance Divergence for Knowledge Distillation

Figure 3 for Balance Divergence for Knowledge Distillation

Figure 4 for Balance Divergence for Knowledge Distillation

Abstract:Knowledge distillation has been widely adopted in computer vision task processing, since it can effectively enhance the performance of lightweight student networks by leveraging the knowledge transferred from cumbersome teacher networks. Most existing knowledge distillation methods utilize Kullback-Leibler divergence to mimic the logit output probabilities between the teacher network and the student network. Nonetheless, these methods may neglect the negative parts of the teacher's ''dark knowledge'' because the divergence calculations may ignore the effect of the minute probabilities from the teacher's logit output. This deficiency may lead to suboptimal performance in logit mimicry during the distillation process and result in an imbalance of information acquired by the student network. In this paper, we investigate the impact of this imbalance and propose a novel method, named Balance Divergence Distillation. By introducing a compensatory operation using reverse Kullback-Leibler divergence, our method can improve the modeling of the extremely small values in the negative from the teacher and preserve the learning capacity for the positive. Furthermore, we test the impact of different temperature coefficients adjustments, which may conducted to further balance for knowledge transferring. We evaluate the proposed method on several computer vision tasks, including image classification and semantic segmentation. The evaluation results show that our method achieves an accuracy improvement of 1%~3% for lightweight students on both CIFAR-100 and ImageNet dataset, and a 4.55% improvement in mIoU for PSP-ResNet18 on the Cityscapes dataset. The experiments show that our method is a simple yet highly effective solution that can be smoothly applied to different knowledge distillation methods.

Via

Access Paper or Ask Questions

Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference

Dec 25, 2024

Libo Zhang, Zhaoning Zhang, Baizhou Xu, Songzhu Mei, Dongsheng Li

Abstract:Due to the high resource demands of Large Language Models (LLMs), achieving widespread deployment on consumer-grade devices presents significant challenges. Typically, personal or consumer-grade devices, including servers configured prior to the era of large-scale models, generally have relatively weak GPUs and relatively strong CPUs. However, most current methods primarily depend on GPUs for computation. Therefore, we propose Dovetail, an approach that deploys the draft model on the GPU to generate draft tokens while allowing the target model to perform parallel verification on the CPU, thereby improving the utilization of all available hardware resources and occupying less inter-device communication bandwidth. Accordingly, we have redesigned the draft model to better align with heterogeneous hardware characteristics. To this end, we implemented several optimizations: reducing the number of draft tokens to mitigate latency in parallel verification, increasing the depth of the draft model to enhance its predictive capacity, and introducing DGF (Dynamic Gating Fusion) to improve the integration of features and token embeddings. In the HumanEval benchmark, Dovetail achieved an inference speed of 5.86 tokens per second for LLaMA2-Chat-7B using 3GB of VRAM, representing an approximately 2.77x improvement over CPU-only inference. Furthermore, the inference speed was increased to 8 tokens per second when utilizing 7GB of VRAM.

* 9 pages, 7 figures

Via

Access Paper or Ask Questions

Deep Tiny Network for Recognition-Oriented Face Image Quality Assessment

Jun 09, 2021

Baoyun Peng, Min Liu, Heng Yang, Zhaoning Zhang, Dongsheng Li

Figure 1 for Deep Tiny Network for Recognition-Oriented Face Image Quality Assessment

Figure 2 for Deep Tiny Network for Recognition-Oriented Face Image Quality Assessment

Figure 3 for Deep Tiny Network for Recognition-Oriented Face Image Quality Assessment

Figure 4 for Deep Tiny Network for Recognition-Oriented Face Image Quality Assessment

Abstract:Face recognition has made significant progress in recent years due to deep convolutional neural networks (CNN). In many face recognition (FR) scenarios, face images are acquired from a sequence with huge intra-variations. These intra-variations, which are mainly affected by the low-quality face images, cause instability of recognition performance. Previous works have focused on ad-hoc methods to select frames from a video or use face image quality assessment (FIQA) methods, which consider only a particular or combination of several distortions. In this work, we present an efficient non-reference image quality assessment for FR that directly links image quality assessment (IQA) and FR. More specifically, we propose a new measurement to evaluate image quality without any reference. Based on the proposed quality measurement, we propose a deep Tiny Face Quality network (tinyFQnet) to learn a quality prediction function from data. We evaluate the proposed method for different powerful FR models on two classical video-based (or template-based) benchmark: IJB-B and YTF. Extensive experiments show that, although the tinyFQnet is much smaller than the others, the proposed method outperforms state-of-the-art quality assessment methods in terms of effectiveness and efficiency.

Via

Access Paper or Ask Questions

Correlation Congruence for Knowledge Distillation

Apr 03, 2019

Baoyun Peng, Xiao Jin, Jiaheng Liu, Shunfeng Zhou, Yichao Wu, Yu Liu, Dongsheng Li, Zhaoning Zhang

Figure 1 for Correlation Congruence for Knowledge Distillation

Figure 2 for Correlation Congruence for Knowledge Distillation

Figure 3 for Correlation Congruence for Knowledge Distillation

Figure 4 for Correlation Congruence for Knowledge Distillation

Abstract:Most teacher-student frameworks based on knowledge distillation (KD) depend on a strong congruent constraint on instance level. However, they usually ignore the correlation between multiple instances, which is also valuable for knowledge transfer. In this work, we propose a new framework named correlation congruence for knowledge distillation (CCKD), which transfers not only the instance-level information, but also the correlation between instances. Furthermore, a generalized kernel method based on Taylor series expansion is proposed to better capture the correlation between instances. Empirical experiments and ablation studies on image classification tasks (including CIFAR-100, ImageNet-1K) and metric learning tasks (including ReID and Face Recognition) show that the proposed CCKD substantially outperforms the original KD and achieves state-of-the-art accuracy compared with other SOTA KD-based methods. The CCKD can be easily deployed in the majority of the teacher-student framework such as KD and hint-based learning methods.

Via

Access Paper or Ask Questions

ThunderNet: Towards Real-time Generic Object Detection

Mar 28, 2019

Zheng Qin, Zeming Li, Zhaoning Zhang, Yiping Bao, Gang Yu, Yuxing Peng, Jian Sun

Figure 1 for ThunderNet: Towards Real-time Generic Object Detection

Figure 2 for ThunderNet: Towards Real-time Generic Object Detection

Figure 3 for ThunderNet: Towards Real-time Generic Object Detection

Figure 4 for ThunderNet: Towards Real-time Generic Object Detection

Abstract:Real-time generic object detection on mobile platforms is a crucial but challenging computer vision task. However, previous CNN-based detectors suffer from enormous computational cost, which hinders them from real-time inference in computation-constrained scenarios. In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet. In the backbone part, we analyze the drawbacks in previous lightweight backbones and present a lightweight backbone designed for object detection. In the detection part, we exploit an extremely efficient RPN and detection head design. To generate more discriminative feature representation, we design two efficient architecture blocks, Context Enhancement Module and Spatial Attention Module. At last, we investigate the balance between the input resolution, the backbone, and the detection head. Compared with lightweight one-stage detectors, ThunderNet achieves superior performance with only 40% of the computational cost on PASCAL VOC and COCO benchmarks. Without bells and whistles, our model runs at 24.1 fps on an ARM-based device. To the best of our knowledge, this is the first real-time detector reported on ARM platforms. Code will be released for paper reproduction.

* 10 pages, 6 figures

Via

Access Paper or Ask Questions

Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors

Apr 10, 2018

Hao Yu, Zhaoning Zhang, Zheng Qin, Hao Wu, Dongsheng Li, Jun Zhao, Xicheng Lu

Figure 1 for Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors

Figure 2 for Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors

Figure 3 for Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors

Figure 4 for Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors

Abstract:Modern object detectors usually suffer from low accuracy issues, as foregrounds always drown in tons of backgrounds and become hard examples during training. Compared with those proposal-based ones, real-time detectors are in far more serious trouble since they renounce the use of region-proposing stage which is used to filter a majority of backgrounds for achieving real-time rates. Though foregrounds as hard examples are in urgent need of being mined from tons of backgrounds, a considerable number of state-of-the-art real-time detectors, like YOLO series, have yet to profit from existing hard example mining methods, as using these methods need detectors fit series of prerequisites. In this paper, we propose a general hard example mining method named Loss Rank Mining (LRM) to fill the gap. LRM is a general method for real-time detectors, as it utilizes the final feature map which exists in all real-time detectors to mine hard examples. By using LRM, some elements representing easy examples in final feature map are filtered and detectors are forced to concentrate on hard examples during training. Extensive experiments validate the effectiveness of our method. With our method, the improvements of YOLOv2 detector on auto-driving related dataset KITTI and more general dataset PASCAL VOC are over 5% and 2% mAP, respectively. In addition, LRM is the first hard example mining strategy which could fit YOLOv2 perfectly and make it better applied in series of real scenarios where both real-time rates and accurate detection are strongly demanded.

* 8 pages, 6 figures

Via

Access Paper or Ask Questions

Diagonalwise Refactorization: An Efficient Training Method for Depthwise Convolutions

Mar 27, 2018

Zheng Qin, Zhaoning Zhang, Dongsheng Li, Yiming Zhang, Yuxing Peng

Figure 1 for Diagonalwise Refactorization: An Efficient Training Method for Depthwise Convolutions

Figure 2 for Diagonalwise Refactorization: An Efficient Training Method for Depthwise Convolutions

Figure 3 for Diagonalwise Refactorization: An Efficient Training Method for Depthwise Convolutions

Figure 4 for Diagonalwise Refactorization: An Efficient Training Method for Depthwise Convolutions

Abstract:Depthwise convolutions provide significant performance benefits owing to the reduction in both parameters and mult-adds. However, training depthwise convolution layers with GPUs is slow in current deep learning frameworks because their implementations cannot fully utilize the GPU capacity. To address this problem, in this paper we present an efficient method (called diagonalwise refactorization) for accelerating the training of depthwise convolution layers. Our key idea is to rearrange the weight vectors of a depthwise convolution into a large diagonal weight matrix so as to convert the depthwise convolution into one single standard convolution, which is well supported by the cuDNN library that is highly-optimized for GPU computations. We have implemented our training method in five popular deep learning frameworks. Evaluation results show that our proposed method gains $15.4\times$ training speedup on Darknet, $8.4\times$ on Caffe, $5.4\times$ on PyTorch, $3.5\times$ on MXNet, and $1.4\times$ on TensorFlow, compared to their original implementations of depthwise convolutions.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions

Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications

Mar 24, 2018

Zheng Qin, Zhaoning Zhang, Shiqing Zhang, Hao Yu, Yuxing Peng

Figure 1 for Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications

Figure 2 for Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications

Figure 3 for Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications

Figure 4 for Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications

Abstract:Compact neural networks are inclined to exploit "sparsely-connected" convolutions such as depthwise convolution and group convolution for employment in mobile applications. Compared with standard "fully-connected" convolutions, these convolutions are more computationally economical. However, "sparsely-connected" convolutions block the inter-group information exchange, which induces severe performance degradation. To address this issue, we present two novel operations named merging and evolution to leverage the inter-group information. Our key idea is encoding the inter-group information with a narrow feature map, then combining the generated features with the original network for better representation. Taking advantage of the proposed operations, we then introduce the Merging-and-Evolution (ME) module, an architectural unit specifically designed for compact networks. Finally, we propose a family of compact neural networks called MENet based on ME modules. Extensive experiments on ILSVRC 2012 dataset and PASCAL VOC 2007 dataset demonstrate that MENet consistently outperforms other state-of-the-art compact networks under different computational budgets. For instance, under the computational budget of 140 MFLOPs, MENet surpasses ShuffleNet by 1% and MobileNet by 1.95% on ILSVRC 2012 top-1 accuracy, while by 2.3% and 4.1% on PASCAL VOC 2007 mAP, respectively.

* 8 pages, 4 figures

Via

Access Paper or Ask Questions