Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Siyu Xu

FADPNet: Frequency-Aware Dual-Path Network for Face Super-Resolution

Jun 17, 2025

Siyu Xu, Wenjie Li, Guangwei Gao, Jian Yang, Guo-Jun Qi, Chia-Wen Lin

Abstract:Face super-resolution (FSR) under limited computational costs remains an open problem. Existing approaches typically treat all facial pixels equally, resulting in suboptimal allocation of computational resources and degraded FSR performance. CNN is relatively sensitive to high-frequency facial features, such as component contours and facial outlines. Meanwhile, Mamba excels at capturing low-frequency features like facial color and fine-grained texture, and does so with lower complexity than Transformers. Motivated by these observations, we propose FADPNet, a Frequency-Aware Dual-Path Network that decomposes facial features into low- and high-frequency components and processes them via dedicated branches. For low-frequency regions, we introduce a Mamba-based Low-Frequency Enhancement Block (LFEB), which combines state-space attention with squeeze-and-excitation operations to extract low-frequency global interactions and emphasize informative channels. For high-frequency regions, we design a CNN-based Deep Position-Aware Attention (DPA) module to enhance spatially-dependent structural details, complemented by a lightweight High-Frequency Refinement (HFR) module that further refines frequency-specific representations. Through the above designs, our method achieves an excellent balance between FSR quality and model efficiency, outperforming existing approaches.

* 12 pages, 11 figures, 6 tales

Via

Access Paper or Ask Questions

VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation

Feb 04, 2025

Siyu Xu, Yunke Wang, Chenghao Xia, Dihao Zhu, Tao Huang, Chang Xu

Figure 1 for VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation

Figure 2 for VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation

Figure 3 for VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation

Figure 4 for VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation

Abstract:Vision-Language-Action (VLA) model can process instructions and visual perception to directly generate actions as output in an end-to-end fashion due to its strong multi-modal reasoning capabilities. While the performance of VLA models is promising, their computational cost can be substantial. This raises challenge for applying them on robotics tasks, which requires real-time decision-making to respond quickly to environmental changes. Since robotic control involves sequential decision-making, the visual input often exhibits minimal variation between successive steps. A natural idea is to reuse the computational results of unchanged visual tokens from the last step. Motivated by this idea, we propose VLA-Cache, an efficient vision-language-action model. VLA-Cache incorporates a token-selection mechanism that compares the visual input at each step with the input from the previous step, adaptively identifying visual tokens with minimal changes. The computational results for these unchanged tokens are then reused in subsequent steps via KV-cache, thereby significantly improving the efficiency of the VLA-Cache model. Experimental results on both simulation (e.g., LIBERO benchmark and SIMPLER) and real-world robot valid VLA-Cache can achieve practical acceleration with minimal sacrifice in success rate.

Via

Access Paper or Ask Questions

Collage Prompting: Budget-Friendly Visual Recognition with GPT-4V

Mar 18, 2024

Siyu Xu, Yunke Wang, Daochang Liu, Chang Xu

Figure 1 for Collage Prompting: Budget-Friendly Visual Recognition with GPT-4V

Figure 2 for Collage Prompting: Budget-Friendly Visual Recognition with GPT-4V

Figure 3 for Collage Prompting: Budget-Friendly Visual Recognition with GPT-4V

Figure 4 for Collage Prompting: Budget-Friendly Visual Recognition with GPT-4V

Abstract:Recent advancements in generative AI have suggested that by taking visual prompt, GPT-4V can demonstrate significant proficiency in image recognition task. Despite its impressive capabilities, the financial cost associated with GPT-4V's inference presents a substantial barrier for its wide use. To address this challenge, our work introduces Collage Prompting, a budget-friendly prompting approach that concatenates multiple images into a single visual input. With collage prompt, GPT-4V is able to perform image recognition on several images simultaneously. Based on the observation that the accuracy of GPT-4V's image recognition varies significantly with the order of images within the collage prompt, our method further learns to optimize the arrangement of images for maximum recognition accuracy. A graph predictor is trained to indicate the accuracy of each collage prompt, then we propose an optimization method to navigate the search space of possible image arrangements. Experiment results across various datasets demonstrate the cost-efficiency score of collage prompt is much larger than standard prompt. Additionally, collage prompt with learned arrangement achieves clearly better accuracy than collage prompt with random arrangement in GPT-4V's visual recognition.

Via

Access Paper or Ask Questions

A Self-adaptive LSAC-PID Approach based on Lyapunov Reward Shaping for Mobile Robots

Nov 03, 2021

Xinyi Yu, Siyu Xu, Yuehai Fan, Linlin Ou

Figure 1 for A Self-adaptive LSAC-PID Approach based on Lyapunov Reward Shaping for Mobile Robots

Figure 2 for A Self-adaptive LSAC-PID Approach based on Lyapunov Reward Shaping for Mobile Robots

Figure 3 for A Self-adaptive LSAC-PID Approach based on Lyapunov Reward Shaping for Mobile Robots

Figure 4 for A Self-adaptive LSAC-PID Approach based on Lyapunov Reward Shaping for Mobile Robots

Abstract:To solve the coupling problem of control loops and the adaptive parameter tuning problem in the multi-input multi-output (MIMO) PID control system, a self-adaptive LSAC-PID algorithm is proposed based on deep reinforcement learning (RL) and Lyapunov-based reward shaping in this paper. For complex and unknown mobile robot control environment, an RL-based MIMO PID hybrid control strategy is firstly presented. According to the dynamic information and environmental feedback of the mobile robot, the RL agent can output the optimal MIMO PID parameters in real time, without knowing mathematical model and decoupling multiple control loops. Then, to improve the convergence speed of RL and the stability of mobile robots, a Lyapunov-based reward shaping soft actor-critic (LSAC) algorithm is proposed based on Lyapunov theory and potential-based reward shaping method. The convergence and optimality of the algorithm are proved in terms of the policy evaluation and improvement step of soft policy iteration. In addition, for line-following robots, the region growing method is improved to adapt to the influence of forks and environmental interference. Through comparison, test and cross-validation, the simulation and real-environment experimental results all show good performance of the proposed LSAC-PID tuning algorithm.

* 11 pages, 13 figures

Via

Access Paper or Ask Questions

A Self-adaptive SAC-PID Control Approach based on Reinforcement Learning for Mobile Robots

Mar 19, 2021

Xinyi Yu, Yuehai Fan, Siyu Xu, Linlin Ou

Figure 1 for A Self-adaptive SAC-PID Control Approach based on Reinforcement Learning for Mobile Robots

Figure 2 for A Self-adaptive SAC-PID Control Approach based on Reinforcement Learning for Mobile Robots

Figure 3 for A Self-adaptive SAC-PID Control Approach based on Reinforcement Learning for Mobile Robots

Figure 4 for A Self-adaptive SAC-PID Control Approach based on Reinforcement Learning for Mobile Robots

Abstract:Proportional-integral-derivative (PID) control is the most widely used in industrial control, robot control and other fields. However, traditional PID control is not competent when the system cannot be accurately modeled and the operating environment is variable in real time. To tackle these problems, we propose a self-adaptive model-free SAC-PID control approach based on reinforcement learning for automatic control of mobile robots. A new hierarchical structure is developed, which includes the upper controller based on soft actor-critic (SAC), one of the most competitive continuous control algorithms, and the lower controller based on incremental PID controller. Soft actor-critic receives the dynamic information of the mobile robot as input, and simultaneously outputs the optimal parameters of incremental PID controllers to compensate for the error between the path and the mobile robot in real time. In addition, the combination of 24-neighborhood method and polynomial fitting is developed to improve the adaptability of SAC-PID control method to complex environments. The effectiveness of the SAC-PID control method is verified with several different difficulty paths both on Gazebo and real mecanum mobile robot. Futhermore, compared with fuzzy PID control, the SAC-PID method has merits of strong robustness, generalization and real-time performance.

* 20 oages, 12 figures

Via

Access Paper or Ask Questions

Multi-view registration of unordered range scans by fast correspondence propagation of multi-scale descriptors

Apr 21, 2018

Jihua Zhu, Siyu Xu, Zutao Jiang, Shanmin Pang, Jun Wang, Zhongyu Li

Figure 1 for Multi-view registration of unordered range scans by fast correspondence propagation of multi-scale descriptors

Figure 2 for Multi-view registration of unordered range scans by fast correspondence propagation of multi-scale descriptors

Figure 3 for Multi-view registration of unordered range scans by fast correspondence propagation of multi-scale descriptors

Figure 4 for Multi-view registration of unordered range scans by fast correspondence propagation of multi-scale descriptors

Abstract:This paper proposes a global approach for the multi-view registration of unordered range scans. As the basis of multi-view registration, pair-wise registration is very pivotal. Therefore, we first select a good descriptor and accelerate its correspondence propagation for the pair-wise registration. Then, we design an effective rule to judge the reliability of pair-wise registration results. Subsequently, we propose a model augmentation method, which can utilize reliable results of pair-wise registration to augment the model shape. Finally, multi-view registration can be accomplished by operating the pair-wise registration and judgment, and model augmentation alternately. Experimental results on public available data sets show, that this approach can automatically achieve the multi-view registration of unordered range scans with good accuracy and effectiveness.

Via

Access Paper or Ask Questions

Effective scaling registration approach by imposing the emphasis on the scale factor

Apr 28, 2017

Jihua Zhu, Siyu Xu, Jie Hou, Yaochen Li, Jun Wang, Huimin Lu

Figure 1 for Effective scaling registration approach by imposing the emphasis on the scale factor

Figure 2 for Effective scaling registration approach by imposing the emphasis on the scale factor

Figure 3 for Effective scaling registration approach by imposing the emphasis on the scale factor

Figure 4 for Effective scaling registration approach by imposing the emphasis on the scale factor

Abstract:This paper proposes an effective approach for the scaling registration of $m$-D point sets. Different from the rigid transformation, the scaling registration can not be formulated into the common least square function due to the ill-posed problem caused by the scale factor. Therefore, this paper designs a novel objective function for the scaling registration problem. The appearance of this objective function is a rational fraction, where the numerator item is the least square error and the denominator item is the square of the scale factor. By imposing the emphasis on scale factor, the ill-posed problem can be avoided in the scaling registration. Subsequently, the new objective function can be solved by the proposed scaling iterative closest point (ICP) algorithm, which can obtain the optimal scaling transformation. For the practical applications, the scaling ICP algorithm is further extended to align partially overlapping point sets. Finally, the proposed approach is tested on public data sets and applied to merging grid maps of different resolutions. Experimental results demonstrate its superiority over previous approaches on efficiency and robustness.

* 22 pages, 8 figures, 2 tables

Via

Access Paper or Ask Questions