Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhenbo Luo

Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains

May 22, 2025

Wenhui Tan, Jiaze Li, Jianzhong Ju, Zhenbo Luo, Jian Luan, Ruihua Song

Abstract:Large Language Models (LLMs) achieve superior performance through Chain-of-Thought (CoT) reasoning, but these token-level reasoning chains are computationally expensive and inefficient. In this paper, we introduce Compressed Latent Reasoning (CoLaR), a novel framework that dynamically compresses reasoning processes in latent space through a two-stage training approach. First, during supervised fine-tuning, CoLaR extends beyond next-token prediction by incorporating an auxiliary next compressed embedding prediction objective. This process merges embeddings of consecutive tokens using a compression factor randomly sampled from a predefined range, and trains a specialized latent head to predict distributions of subsequent compressed embeddings. Second, we enhance CoLaR through reinforcement learning (RL) that leverages the latent head's non-deterministic nature to explore diverse reasoning paths and exploit more compact ones. This approach enables CoLaR to: i) perform reasoning at a dense latent level (i.e., silently), substantially reducing reasoning chain length, and ii) dynamically adjust reasoning speed at inference time by simply prompting the desired compression factor. Extensive experiments across four mathematical reasoning datasets demonstrate that CoLaR achieves 14.1% higher accuracy than latent-based baseline methods at comparable compression ratios, and reduces reasoning chain length by 53.3% with only 4.8% performance degradation compared to explicit CoT method. Moreover, when applied to more challenging mathematical reasoning tasks, our RL-enhanced CoLaR demonstrates performance gains of up to 5.4% while dramatically reducing latent reasoning chain length by 82.8%. The code and models will be released upon acceptance.

* 15 pages, 8 figures

Via

Access Paper or Ask Questions

VOC-ReID: Vehicle Re-identification based on Vehicle-Orientation-Camera

May 15, 2020

Xiangyu Zhu, Zhenbo Luo, Pei Fu, Xiang Ji

Figure 1 for VOC-ReID: Vehicle Re-identification based on Vehicle-Orientation-Camera

Figure 2 for VOC-ReID: Vehicle Re-identification based on Vehicle-Orientation-Camera

Figure 3 for VOC-ReID: Vehicle Re-identification based on Vehicle-Orientation-Camera

Figure 4 for VOC-ReID: Vehicle Re-identification based on Vehicle-Orientation-Camera

Abstract:Vehicle re-identification is a challenging task due to high intra-class variances and small inter-class variances. In this work, we focus on the failure cases caused by similar background and shape. They pose serve bias on similarity, making it easier to neglect fine-grained information. To reduce the bias, we propose an approach named VOC-ReID, taking the triplet vehicle-orientation-camera as a whole and reforming background/shape similarity as camera/orientation re-identification. At first, we train models for vehicle, orientation and camera re-identification respectively. Then we use orientation and camera similarity as penalty to get final similarity. Besides, we propose a high performance baseline boosted by bag of tricks and weakly supervised data augmentation. Our algorithm achieves the second place in vehicle re-identification at the NVIDIA AI City Challenge 2020.

* AICity2020 Challenge, CVPR 2020 workshop, code avaible at github(link in abstract)

Via

Access Paper or Ask Questions

Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation

May 15, 2019

Xiaobing Wang, Yingying Jiang, Zhenbo Luo, Cheng-Lin Liu, Hyunsoo Choi, Sungjin Kim

Figure 1 for Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation

Figure 2 for Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation

Figure 3 for Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation

Figure 4 for Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation

Abstract:Scene text detection attracts much attention in computer vision, because it can be widely used in many applications such as real-time text translation, automatic information entry, blind person assistance, robot sensing and so on. Though many methods have been proposed for horizontal and oriented texts, detecting irregular shape texts such as curved texts is still a challenging problem. To solve the problem, we propose a robust scene text detection method with adaptive text region representation. Given an input image, a text region proposal network is first used for extracting text proposals. Then, these proposals are verified and refined with a refinement network. Here, recurrent neural network based adaptive text region representation is proposed for text region refinement, where a pair of boundary points are predicted each time step until no new points are found. In this way, text regions of arbitrary shapes are detected and represented with adaptive number of boundary points. This gives more accurate description of text regions. Experimental results on five benchmarks, namely, CTW1500, TotalText, ICDAR2013, ICDAR2015 and MSRATD500, show that the proposed method achieves state-of-the-art in scene text detection.

Via

Access Paper or Ask Questions

Structured Knowledge Distillation for Semantic Segmentation

Mar 12, 2019

Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, Jingdong Wang

Figure 1 for Structured Knowledge Distillation for Semantic Segmentation

Figure 2 for Structured Knowledge Distillation for Semantic Segmentation

Figure 3 for Structured Knowledge Distillation for Semantic Segmentation

Figure 4 for Structured Knowledge Distillation for Semantic Segmentation

Abstract:In this paper, we investigate the knowledge distillation strategy for training small semantic segmentation networks by making use of large networks. We start from the straightforward scheme, pixel-wise distillation, which applies the distillation scheme adopted for image classification and performs knowledge distillation for each pixel separately. We further propose to distill the structured knowledge from large networks to small networks, which is motivated by that semantic segmentation is a structured prediction problem. We study two structured distillation schemes: (i) pair-wise distillation that distills the pairwise similarities, and (ii) holistic distillation that uses GAN to distill holistic knowledge. The effectiveness of our knowledge distillation approaches is demonstrated by extensive experiments on three scene parsing datasets: Cityscapes, Camvid and ADE20K.

* 10 pagers cvpr2019 accepted

Via

Access Paper or Ask Questions

Deep Residual Text Detection Network for Scene Text

Nov 11, 2017

Xiangyu Zhu, Yingying Jiang, Shuli Yang, Xiaobing Wang, Wei Li, Pei Fu, Hua Wang, Zhenbo Luo

Figure 1 for Deep Residual Text Detection Network for Scene Text

Figure 2 for Deep Residual Text Detection Network for Scene Text

Figure 3 for Deep Residual Text Detection Network for Scene Text

Figure 4 for Deep Residual Text Detection Network for Scene Text

Abstract:Scene text detection is a challenging problem in computer vision. In this paper, we propose a novel text detection network based on prevalent object detection frameworks. In order to obtain stronger semantic feature, we adopt ResNet as feature extraction layers and exploit multi-level feature by combining hierarchical convolutional networks. A vertical proposal mechanism is utilized to avoid proposal classification, while regression layer remains working to improve localization accuracy. Our approach evaluated on ICDAR2013 dataset achieves F-measure of 0.91, which outperforms previous state-of-the-art results in scene text detection.

* IAPR International Conference on Document Analysis and Recognition (ICDAR) 2017

Via

Access Paper or Ask Questions

R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Jun 30, 2017

Yingying Jiang, Xiangyu Zhu, Xiaobing Wang, Shuli Yang, Wei Li, Hua Wang, Pei Fu, Zhenbo Luo

Figure 1 for R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Figure 2 for R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Figure 3 for R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Figure 4 for R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

Abstract:In this paper, we propose a novel method called Rotational Region CNN (R2CNN) for detecting arbitrary-oriented texts in natural scene images. The framework is based on Faster R-CNN [1] architecture. First, we use the Region Proposal Network (RPN) to generate axis-aligned bounding boxes that enclose the texts with different orientations. Second, for each axis-aligned text box proposed by RPN, we extract its pooled features with different pooled sizes and the concatenated features are used to simultaneously predict the text/non-text score, axis-aligned box and inclined minimum area box. At last, we use an inclined non-maximum suppression to get the detection results. Our approach achieves competitive results on text detection benchmarks: ICDAR 2015 and ICDAR 2013.

* 8 pages, 6 figures, 3 tables

Via

Access Paper or Ask Questions

Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks

May 07, 2017

Yifan Liu, Zengchang Qin, Zhenbo Luo, Hua Wang

Figure 1 for Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks

Figure 2 for Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks

Figure 3 for Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks

Figure 4 for Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks

Abstract:Recently, realistic image generation using deep neural networks has become a hot topic in machine learning and computer vision. Images can be generated at the pixel level by learning from a large collection of images. Learning to generate colorful cartoon images from black-and-white sketches is not only an interesting research problem, but also a potential application in digital entertainment. In this paper, we investigate the sketch-to-image synthesis problem by using conditional generative adversarial networks (cGAN). We propose the auto-painter model which can automatically generate compatible colors for a sketch. The new model is not only capable of painting hand-draw sketch with proper colors, but also allowing users to indicate preferred colors. Experimental results on two sketch datasets show that the auto-painter performs better that existing image-to-image methods.

* 12 pages, 7 figures

Via

Access Paper or Ask Questions