Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Biao Hou

SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models

May 12, 2025

Hang Wu, Jianian Zhu, Yinghui Li, Haojie Wang, Biao Hou, Jidong Zhai

Abstract:Large Language Models (LLMs) present a critical trade-off between inference quality and computational cost: larger models offer superior capabilities but incur significant latency, while smaller models are faster but less powerful. Existing serving strategies often employ fixed model scales or static two-stage speculative decoding, failing to dynamically adapt to the varying complexities of user requests or fluctuations in system performance. This paper introduces \systemname{}, a novel framework that reimagines LLM inference as an adaptive routing problem solved through multi-level speculative decoding. \systemname{} dynamically constructs and optimizes inference "paths" (chains of models) based on real-time feedback, addressing the limitations of static approaches. Our contributions are threefold: (1) An \textbf{adaptive model chain scheduling} mechanism that leverages performance profiling (execution times) and predictive similarity metrics (derived from token distribution divergence) to continuously select the optimal sequence of draft and verifier models, minimizing predicted latency per generated token. (2) A \textbf{multi-level collaborative verification} framework where intermediate models within the selected chain can validate speculative tokens, reducing the verification burden on the final, most powerful target model. (3) A \textbf{synchronized state management} system providing efficient, consistent KV cache handling across heterogeneous models in the chain, including precise, low-overhead rollbacks tailored for asynchronous batch processing inherent in multi-level speculation. Preliminary experiments demonstrate the validity of our method.

* 10 pages

Via

Access Paper or Ask Questions

Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery

Sep 09, 2024

Fan Zhang, Lingling Li, Licheng Jiao, Xu Liu, Fang Liu, Shuyuan Yang, Biao Hou

Figure 1 for Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery

Figure 2 for Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery

Figure 3 for Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery

Figure 4 for Renormalized Connection for Scale-preferred Object Detection in Satellite Imagery

Abstract:Satellite imagery, due to its long-range imaging, brings with it a variety of scale-preferred tasks, such as the detection of tiny/small objects, making the precise localization and detection of small objects of interest a challenging task. In this article, we design a Knowledge Discovery Network (KDN) to implement the renormalization group theory in terms of efficient feature extraction. Renormalized connection (RC) on the KDN enables ``synergistic focusing'' of multi-scale features. Based on our observations of KDN, we abstract a class of RCs with different connection strengths, called n21C, and generalize it to FPN-based multi-branch detectors. In a series of FPN experiments on the scale-preferred tasks, we found that the ``divide-and-conquer'' idea of FPN severely hampers the detector's learning in the right direction due to the large number of large-scale negative samples and interference from background noise. Moreover, these negative samples cannot be eliminated by the focal loss function. The RCs extends the multi-level feature's ``divide-and-conquer'' mechanism of the FPN-based detectors to a wide range of scale-preferred tasks, and enables synergistic effects of multi-level features on the specific learning goal. In addition, interference activations in two aspects are greatly reduced and the detector learns in a more correct direction. Extensive experiments of 17 well-designed detection architectures embedded with n21s on five different levels of scale-preferred tasks validate the effectiveness and efficiency of the RCs. Especially the simplest linear form of RC, E421C performs well in all tasks and it satisfies the scaling property of RGT. We hope that our approach will transfer a large number of well-designed detectors from the computer vision community to the remote sensing community.

* IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1-23, 2024, Art no. 5638023
* 24 pages, 14 figures Journal

Via

Access Paper or Ask Questions

Masked Angle-Aware Autoencoder for Remote Sensing Images

Aug 04, 2024

Zhihao Li, Biao Hou, Siteng Ma, Zitong Wu, Xianpeng Guo, Bo Ren, Licheng Jiao

Abstract:To overcome the inherent domain gap between remote sensing (RS) images and natural images, some self-supervised representation learning methods have made promising progress. However, they have overlooked the diverse angles present in RS objects. This paper proposes the Masked Angle-Aware Autoencoder (MA3E) to perceive and learn angles during pre-training. We design a \textit{scaling center crop} operation to create the rotated crop with random orientation on each original image, introducing the explicit angle variation. MA3E inputs this composite image while reconstruct the original image, aiming to effectively learn rotation-invariant representations by restoring the angle variation introduced on the rotated crop. To avoid biases caused by directly reconstructing the rotated crop, we propose an Optimal Transport (OT) loss that automatically assigns similar original image patches to each rotated crop patch for reconstruction. MA3E demonstrates more competitive performance than existing pre-training methods on seven different RS image datasets in three downstream tasks.

* This paper has been accepted by ECCV 2024

Via

Access Paper or Ask Questions

Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition

Jan 08, 2022

Helei Qiu, Biao Hou, Bo Ren, Xiaohua Zhang

Figure 1 for Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition

Figure 2 for Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition

Figure 3 for Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition

Figure 4 for Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition

Abstract:Capturing the dependencies between joints is critical in skeleton-based action recognition task. Transformer shows great potential to model the correlation of important joints. However, the existing Transformer-based methods cannot capture the correlation of different joints between frames, which the correlation is very useful since different body parts (such as the arms and legs in "long jump") between adjacent frames move together. Focus on this problem, A novel spatio-temporal tuples Transformer (STTFormer) method is proposed. The skeleton sequence is divided into several parts, and several consecutive frames contained in each part are encoded. And then a spatio-temporal tuples self-attention module is proposed to capture the relationship of different joints in consecutive frames. In addition, a feature aggregation module is introduced between non-adjacent frames to enhance the ability to distinguish similar actions. Compared with the state-of-the-art methods, our method achieves better performance on two large-scale datasets.

* 14 pages, 5 figures

Via

Access Paper or Ask Questions

FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection

Dec 01, 2021

Zhonghua Li, Biao Hou, Zitong Wu, Licheng Jiao, Bo Ren, Chen Yang

Figure 1 for FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection

Figure 2 for FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection

Figure 3 for FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection

Figure 4 for FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection

Abstract:Existing anchor-base oriented object detection methods have achieved amazing results, but these methods require some manual preset boxes, which introduces additional hyperparameters and calculations. The existing anchor-free methods usually have complex architectures and are not easy to deploy. Our goal is to propose an algorithm which is simple and easy-to-deploy for aerial image detection. In this paper, we present a one-stage anchor-free rotated object detector (FCOSR) based on FCOS, which can be deployed on most platforms. The FCOSR has a simple architecture consisting of only convolution layers. Our work focuses on the label assignment strategy for the training phase. We use ellipse center sampling method to define a suitable sampling region for oriented bounding box (OBB). The fuzzy sample assignment strategy provides reasonable labels for overlapping objects. To solve the insufficient sampling problem, a multi-level sampling module is designed. These strategies allocate more appropriate labels to training samples. Our algorithm achieves 79.25, 75.41, and 90.15 mAP on DOTA1.0, DOTA1.5, and HRSC2016 datasets, respectively. FCOSR demonstrates superior performance to other methods in single-scale evaluation. We convert a lightweight FCOSR model to TensorRT format, which achieves 73.93 mAP on DOTA1.0 at a speed of 10.68 FPS on Jetson Xavier NX with single scale. The code is available at: https://github.com/lzh420202/FCOSR

* 10 pages, 6 tables, 7 figures

Via

Access Paper or Ask Questions

More Separable and Easier to Segment: A Cluster Alignment Method for Cross-Domain Semantic Segmentation

May 07, 2021

Shuang Wang, Dong Zhao, Yi Li, Chi Zhang, Yuwei Guo, Qi Zang, Biao Hou, Licheng Jiao

Figure 1 for More Separable and Easier to Segment: A Cluster Alignment Method for Cross-Domain Semantic Segmentation

Figure 2 for More Separable and Easier to Segment: A Cluster Alignment Method for Cross-Domain Semantic Segmentation

Figure 3 for More Separable and Easier to Segment: A Cluster Alignment Method for Cross-Domain Semantic Segmentation

Figure 4 for More Separable and Easier to Segment: A Cluster Alignment Method for Cross-Domain Semantic Segmentation

Abstract:Feature alignment between domains is one of the mainstream methods for Unsupervised Domain Adaptation (UDA) semantic segmentation. Existing feature alignment methods for semantic segmentation learn domain-invariant features by adversarial training to reduce domain discrepancy, but they have two limits: 1) associations among pixels are not maintained, 2) the classifier trained on the source domain couldn't adapted well to the target. In this paper, we propose a new UDA semantic segmentation approach based on domain closeness assumption to alleviate the above problems. Specifically, a prototype clustering strategy is applied to cluster pixels with the same semantic, which will better maintain associations among target domain pixels during the feature alignment. After clustering, to make the classifier more adaptive, a normalized cut loss based on the affinity graph of the target domain is utilized, which will make the decision boundary target-specific. Sufficient experiments conducted on GTA5 $\rightarrow$ Cityscapes and SYNTHIA $\rightarrow$ Cityscapes proved the effectiveness of our method, which illustrated that our results achieved the new state-of-the-art.

Via

Access Paper or Ask Questions

Modified Diversity of Class Probability Estimation Co-training for Hyperspectral Image Classification

Sep 05, 2018

Yan Ju, Lingling Li, Licheng Jiao, Zhongle Ren, Biao Hou, Shuyuan Yang

Figure 1 for Modified Diversity of Class Probability Estimation Co-training for Hyperspectral Image Classification

Figure 2 for Modified Diversity of Class Probability Estimation Co-training for Hyperspectral Image Classification

Figure 3 for Modified Diversity of Class Probability Estimation Co-training for Hyperspectral Image Classification

Figure 4 for Modified Diversity of Class Probability Estimation Co-training for Hyperspectral Image Classification

Abstract:Due to the limited amount and imbalanced classes of labeled training data, the conventional supervised learning can not ensure the discrimination of the learned feature for hyperspectral image (HSI) classification. In this paper, we propose a modified diversity of class probability estimation (MDCPE) with two deep neural networks to learn spectral-spatial feature for HSI classification. In co-training phase, recurrent neural network (RNN) and convolutional neural network (CNN) are utilized as two learners to extract features from labeled and unlabeled data. Based on the extracted features, MDCPE selects most credible samples to update initial labeled data by combining k-means clustering with the traditional diversity of class probability estimation (DCPE) co-training. In this way, MDCPE can keep new labeled data class-balanced and extract discriminative features for both the minority and majority classes. During testing process, classification results are acquired by co-decision of the two learners. Experimental results demonstrate that the proposed semi-supervised co-training method can make full use of unlabeled information to enhance generality of the learners and achieve favorable accuracies on all three widely used data sets: Salinas, Pavia University and Pavia Center.

* 13 pages, 10 figures and 8 tables

Via

Access Paper or Ask Questions

Discriminative Transformation Learning for Fuzzy Sparse Subspace Clustering

Jul 18, 2017

Zaidao Wen, Biao Hou, Qian Wu, Licheng Jiao

Figure 1 for Discriminative Transformation Learning for Fuzzy Sparse Subspace Clustering

Figure 2 for Discriminative Transformation Learning for Fuzzy Sparse Subspace Clustering

Figure 3 for Discriminative Transformation Learning for Fuzzy Sparse Subspace Clustering

Figure 4 for Discriminative Transformation Learning for Fuzzy Sparse Subspace Clustering

Abstract:This paper develops a novel iterative framework for subspace clustering in a learned discriminative feature domain. This framework consists of two modules of fuzzy sparse subspace clustering and discriminative transformation learning. In the first module, fuzzy latent labels containing discriminative information and latent representations capturing the subspace structure will be simultaneously evaluated in a feature domain. Then the linear transforming operator with respect to the feature domain will be successively updated in the second module with the advantages of more discrimination, subspace structure preservation and robustness to outliers. These two modules will be alternatively carried out and both theoretical analysis and empirical evaluations will demonstrate its effectiveness and superiorities. In particular, experimental results on three benchmark databases for subspace clustering clearly illustrate that the proposed framework can achieve significant improvements than other state-of-the-art approaches in terms of clustering accuracy.

* IEEE Trans. Cybern. Accept

Via

Access Paper or Ask Questions

Discriminative Nonlinear Analysis Operator Learning: When Cosparse Model Meets Image Classification

Apr 30, 2017

Zaidao Wen, Biao Hou, Licheng Jiao

Figure 1 for Discriminative Nonlinear Analysis Operator Learning: When Cosparse Model Meets Image Classification

Figure 2 for Discriminative Nonlinear Analysis Operator Learning: When Cosparse Model Meets Image Classification

Figure 3 for Discriminative Nonlinear Analysis Operator Learning: When Cosparse Model Meets Image Classification

Figure 4 for Discriminative Nonlinear Analysis Operator Learning: When Cosparse Model Meets Image Classification

Abstract:Linear synthesis model based dictionary learning framework has achieved remarkable performances in image classification in the last decade. Behaved as a generative feature model, it however suffers from some intrinsic deficiencies. In this paper, we propose a novel parametric nonlinear analysis cosparse model (NACM) with which a unique feature vector will be much more efficiently extracted. Additionally, we derive a deep insight to demonstrate that NACM is capable of simultaneously learning the task adapted feature transformation and regularization to encode our preferences, domain prior knowledge and task oriented supervised information into the features. The proposed NACM is devoted to the classification task as a discriminative feature model and yield a novel discriminative nonlinear analysis operator learning framework (DNAOL). The theoretical analysis and experimental performances clearly demonstrate that DNAOL will not only achieve the better or at least competitive classification accuracies than the state-of-the-art algorithms but it can also dramatically reduce the time complexities in both training and testing phases.

* IEEE TIP Accepted

Via

Access Paper or Ask Questions

Target Oriented High Resolution SAR Image Formation via Semantic Information Guided Regularizations

Apr 24, 2017

Biao Hou, Zaidao Wen, Licheng Jiao, Qian Wu

Figure 1 for Target Oriented High Resolution SAR Image Formation via Semantic Information Guided Regularizations

Figure 2 for Target Oriented High Resolution SAR Image Formation via Semantic Information Guided Regularizations

Figure 3 for Target Oriented High Resolution SAR Image Formation via Semantic Information Guided Regularizations

Figure 4 for Target Oriented High Resolution SAR Image Formation via Semantic Information Guided Regularizations

Abstract:Sparsity-regularized synthetic aperture radar (SAR) imaging framework has shown its remarkable performance to generate a feature enhanced high resolution image, in which a sparsity-inducing regularizer is involved by exploiting the sparsity priors of some visual features in the underlying image. However, since the simple prior of low level features are insufficient to describe different semantic contents in the image, this type of regularizer will be incapable of distinguishing between the target of interest and unconcerned background clutters. As a consequence, the features belonging to the target and clutters are simultaneously affected in the generated image without concerning their underlying semantic labels. To address this problem, we propose a novel semantic information guided framework for target oriented SAR image formation, which aims at enhancing the interested target scatters while suppressing the background clutters. Firstly, we develop a new semantics-specific regularizer for image formation by exploiting the statistical properties of different semantic categories in a target scene SAR image. In order to infer the semantic label for each pixel in an unsupervised way, we moreover induce a novel high-level prior-driven regularizer and some semantic causal rules from the prior knowledge. Finally, our regularized framework for image formation is further derived as a simple iteratively reweighted $\ell_1$ minimization problem which can be conveniently solved by many off-the-shelf solvers. Experimental results demonstrate the effectiveness and superiority of our framework for SAR image formation in terms of target enhancement and clutters suppression, compared with the state of the arts. Additionally, the proposed framework opens a new direction of devoting some machine learning strategies to image formation, which can benefit the subsequent decision making tasks.

* Submitted to IEEE TGRS

Via

Access Paper or Ask Questions