Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huajun Liu

BHViT: Binarized Hybrid Vision Transformer

Mar 05, 2025

Tian Gao, Zhiyuan Zhang, Yu Zhang, Huajun Liu, Kaijie Yin, Chengzhong Xu, Hui Kong

Abstract:Model binarization has made significant progress in enabling real-time and energy-efficient computation for convolutional neural networks (CNN), offering a potential solution to the deployment challenges faced by Vision Transformers (ViTs) on edge devices. However, due to the structural differences between CNN and Transformer architectures, simply applying binary CNN strategies to the ViT models will lead to a significant performance drop. To tackle this challenge, we propose BHViT, a binarization-friendly hybrid ViT architecture and its full binarization model with the guidance of three important observations. Initially, BHViT utilizes the local information interaction and hierarchical feature aggregation technique from coarse to fine levels to address redundant computations stemming from excessive tokens. Then, a novel module based on shift operations is proposed to enhance the performance of the binary Multilayer Perceptron (MLP) module without significantly increasing computational overhead. In addition, an innovative attention matrix binarization method based on quantization decomposition is proposed to evaluate the token's importance in the binarized attention matrix. Finally, we propose a regularization loss to address the inadequate optimization caused by the incompatibility between the weight oscillation in the binary layers and the Adam Optimizer. Extensive experimental results demonstrate that our proposed algorithm achieves SOTA performance among binary ViT methods.

* Accepted by CVPR2025

Via

Access Paper or Ask Questions

PICTS: A Novel Deep Reinforcement Learning Approach for Dynamic P-I Control in Scanning Probe Microscopy

Feb 11, 2025

Ziwei Wei, Shuming Wei, Qibin Zeng, Wanheng Lu, Huajun Liu, Kaiyang Zeng

Abstract:We have developed a Parallel Integrated Control and Training System, leveraging the deep reinforcement learning to dynamically adjust the control strategies in real time for scanning probe microscopy techniques.

* 21 pages, 6 figures

Via

Access Paper or Ask Questions

Graph Gain: A Concave-Hull Based Volumetric Gain for Robotic Exploration

Apr 22, 2022

Zezhou Sun, Huajun Liu, Chengzhong Xu, Hui Kong

Figure 1 for Graph Gain: A Concave-Hull Based Volumetric Gain for Robotic Exploration

Figure 2 for Graph Gain: A Concave-Hull Based Volumetric Gain for Robotic Exploration

Figure 3 for Graph Gain: A Concave-Hull Based Volumetric Gain for Robotic Exploration

Figure 4 for Graph Gain: A Concave-Hull Based Volumetric Gain for Robotic Exploration

Abstract:The existing volumetric gain for robotic exploration is calculated in the 3D occupancy map, while the sampling-based exploration method is extended in the reachable (free) space. The inconsistency between them makes the existing calculation of volumetric gain inappropriate for a complete exploration of the environment. To address this issue, we propose a concave-hull based volumetric gain in a sampling-based exploration framework. The concave hull is constructed based on the viewpoints generated by Rapidly-exploring Random Tree (RRT) and the nodes that fail to expand. All space outside this concave hull is considered unknown. The volumetric gain is calculated based on the viewpoints configuration rather than using the occupancy map. With the new volumetric gain, robots can avoid inefficient or even erroneous exploration behavior caused by the inappropriateness of existing volumetric gain calculation methods. Our exploration method is evaluated against the existing state-of-the-art RRT-based method in a benchmark environment. In the evaluated environment, the average running time of our method is about 38.4% of the existing state-of-the-art method and our method is more robust.

Via

Access Paper or Ask Questions

Polarized Self-Attention: Towards High-quality Pixel-wise Regression

Jul 08, 2021

Huajun Liu, Fuqiang Liu, Xinyi Fan, Dong Huang

Figure 1 for Polarized Self-Attention: Towards High-quality Pixel-wise Regression

Figure 2 for Polarized Self-Attention: Towards High-quality Pixel-wise Regression

Figure 3 for Polarized Self-Attention: Towards High-quality Pixel-wise Regression

Figure 4 for Polarized Self-Attention: Towards High-quality Pixel-wise Regression

Abstract:Pixel-wise regression is probably the most common problem in fine-grained computer vision tasks, such as estimating keypoint heatmaps and segmentation masks. These regression problems are very challenging particularly because they require, at low computation overheads, modeling long-range dependencies on high-resolution inputs/outputs to estimate the highly nonlinear pixel-wise semantics. While attention mechanisms in Deep Convolutional Neural Networks(DCNNs) has become popular for boosting long-range dependencies, element-specific attention, such as Nonlocal blocks, is highly complex and noise-sensitive to learn, and most of simplified attention hybrids try to reach the best compromise among multiple types of tasks. In this paper, we present the Polarized Self-Attention(PSA) block that incorporates two critical designs towards high-quality pixel-wise regression: (1) Polarized filtering: keeping high internal resolution in both channel and spatial attention computation while completely collapsing input tensors along their counterpart dimensions. (2) Enhancement: composing non-linearity that directly fits the output distribution of typical fine-grained regression, such as the 2D Gaussian distribution (keypoint heatmaps), or the 2D Binormial distribution (binary segmentation masks). PSA appears to have exhausted the representation capacity within its channel-only and spatial-only branches, such that there is only marginal metric differences between its sequential and parallel layouts. Experimental results show that PSA boosts standard baselines by $2-4$ points, and boosts state-of-the-arts by $1-2$ points on 2D pose estimation and semantic segmentation benchmarks.

Via

Access Paper or Ask Questions

DeepDA: LSTM-based Deep Data Association Network for Multi-Targets Tracking in Clutter

Jul 16, 2019

Huajun Liu, Hui Zhang, Christoph Mertz

Figure 1 for DeepDA: LSTM-based Deep Data Association Network for Multi-Targets Tracking in Clutter

Figure 2 for DeepDA: LSTM-based Deep Data Association Network for Multi-Targets Tracking in Clutter

Figure 3 for DeepDA: LSTM-based Deep Data Association Network for Multi-Targets Tracking in Clutter

Figure 4 for DeepDA: LSTM-based Deep Data Association Network for Multi-Targets Tracking in Clutter

Abstract:The Long Short-Term Memory (LSTM) neural network based data association algorithm named as DeepDA for multi-target tracking in clutters is proposed to deal with the NP-hard combinatorial optimization problem in this paper. Different from the classical data association methods involving complex models and accurate prior knowledge on clutter density, filter covariance or associated gating etc, data-driven deep learning methods have been extensively researched for this topic. Firstly, data association mathematical problem for multitarget tracking on unknown target number, missed detection and clutter, which is beyond one-to-one mapping between observations and targets is redefined formally. Subsequently, an LSTM network is designed to learn the measurement-to-track association probability from radar noisy measurements and exist tracks. Moreover, an LSTM-based data-driven deep neural network after a supervised training through the BPTT and RMSprop optimization method can get the association probability directly. Experimental results on simulated data show a significant performance on association ratio, target ID switching and time-consuming for tracking multiple targets even they are crossing each other in the complicated clutter environment.

* 8 pages, 12 figures

Via

Access Paper or Ask Questions