Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hanqing Liu

Fast Adversarial Training with Weak-to-Strong Spatial-Temporal Consistency in the Frequency Domain on Videos

Apr 21, 2025

Songping Wang, Hanqing Liu, Yueming Lyu, Xiantao Hu, Ziwen He, Wei Wang, Caifeng Shan, Liang Wang

Abstract:Adversarial Training (AT) has been shown to significantly enhance adversarial robustness via a min-max optimization approach. However, its effectiveness in video recognition tasks is hampered by two main challenges. First, fast adversarial training for video models remains largely unexplored, which severely impedes its practical applications. Specifically, most video adversarial training methods are computationally costly, with long training times and high expenses. Second, existing methods struggle with the trade-off between clean accuracy and adversarial robustness. To address these challenges, we introduce Video Fast Adversarial Training with Weak-to-Strong consistency (VFAT-WS), the first fast adversarial training method for video data. Specifically, VFAT-WS incorporates the following key designs: First, it integrates a straightforward yet effective temporal frequency augmentation (TF-AUG), and its spatial-temporal enhanced form STF-AUG, along with a single-step PGD attack to boost training efficiency and robustness. Second, it devises a weak-to-strong spatial-temporal consistency regularization, which seamlessly integrates the simpler TF-AUG and the more complex STF-AUG. Leveraging the consistency regularization, it steers the learning process from simple to complex augmentations. Both of them work together to achieve a better trade-off between clean accuracy and robustness. Extensive experiments on UCF-101 and HMDB-51 with both CNN and Transformer-based models demonstrate that VFAT-WS achieves great improvements in adversarial robustness and corruption robustness, while accelerating training by nearly 490%.

Via

Access Paper or Ask Questions

scAgent: Universal Single-Cell Annotation via a LLM Agent

Apr 07, 2025

Yuren Mao, Yu Mi, Peigen Liu, Mengfei Zhang, Hanqing Liu, Yunjun Gao

Abstract:Cell type annotation is critical for understanding cellular heterogeneity. Based on single-cell RNA-seq data and deep learning models, good progress has been made in annotating a fixed number of cell types within a specific tissue. However, universal cell annotation, which can generalize across tissues, discover novel cell types, and extend to novel cell types, remains less explored. To fill this gap, this paper proposes scAgent, a universal cell annotation framework based on Large Language Models (LLMs). scAgent can identify cell types and discover novel cell types in diverse tissues; furthermore, it is data efficient to learn novel cell types. Experimental studies in 160 cell types and 35 tissues demonstrate the superior performance of scAgent in general cell-type annotation, novel cell discovery, and extensibility to novel cell type.

Via

Access Paper or Ask Questions

When Lighting Deceives: Exposing Vision-Language Models' Illumination Vulnerability Through Illumination Transformation Attack

Mar 10, 2025

Hanqing Liu, Shouwei Ruan, Yao Huang, Shiji Zhao, Xingxing Wei

Figure 1 for When Lighting Deceives: Exposing Vision-Language Models' Illumination Vulnerability Through Illumination Transformation Attack

Figure 2 for When Lighting Deceives: Exposing Vision-Language Models' Illumination Vulnerability Through Illumination Transformation Attack

Figure 3 for When Lighting Deceives: Exposing Vision-Language Models' Illumination Vulnerability Through Illumination Transformation Attack

Figure 4 for When Lighting Deceives: Exposing Vision-Language Models' Illumination Vulnerability Through Illumination Transformation Attack

Abstract:Vision-Language Models (VLMs) have achieved remarkable success in various tasks, yet their robustness to real-world illumination variations remains largely unexplored. To bridge this gap, we propose \textbf{I}llumination \textbf{T}ransformation \textbf{A}ttack (\textbf{ITA}), the first framework to systematically assess VLMs' robustness against illumination changes. However, there still exist two key challenges: (1) how to model global illumination with fine-grained control to achieve diverse lighting conditions and (2) how to ensure adversarial effectiveness while maintaining naturalness. To address the first challenge, we innovatively decompose global illumination into multiple parameterized point light sources based on the illumination rendering equation. This design enables us to model more diverse lighting variations that previous methods could not capture. Then, by integrating these parameterized lighting variations with physics-based lighting reconstruction techniques, we could precisely render such light interactions in the original scenes, finally meeting the goal of fine-grained lighting control. For the second challenge, by controlling illumination through the lighting reconstrution model's latent space rather than direct pixel manipulation, we inherently preserve physical lighting priors. Furthermore, to prevent potential reconstruction artifacts, we design additional perceptual constraints for maintaining visual consistency with original images and diversity constraints for avoiding light source convergence. Extensive experiments demonstrate that our ITA could significantly reduce the performance of advanced VLMs, e.g., LLaVA-1.6, while possessing competitive naturalness, exposing VLMS' critical illuminiation vulnerabilities.

Via

Access Paper or Ask Questions

Global Challenge for Safe and Secure LLMs Track 1

Nov 21, 2024

Xiaojun Jia, Yihao Huang, Yang Liu, Peng Yan Tan, Weng Kuan Yau, Mun-Thye Mak, Xin Ming Sim, Wee Siong Ng, See Kiong Ng, Hanqing Liu(+20 more)

Figure 1 for Global Challenge for Safe and Secure LLMs Track 1

Figure 2 for Global Challenge for Safe and Secure LLMs Track 1

Figure 3 for Global Challenge for Safe and Secure LLMs Track 1

Figure 4 for Global Challenge for Safe and Secure LLMs Track 1

Abstract:This paper introduces the Global Challenge for Safe and Secure Large Language Models (LLMs), a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) to foster the development of advanced defense mechanisms against automated jailbreaking attacks. With the increasing integration of LLMs in critical sectors such as healthcare, finance, and public administration, ensuring these models are resilient to adversarial attacks is vital for preventing misuse and upholding ethical standards. This competition focused on two distinct tracks designed to evaluate and enhance the robustness of LLM security frameworks. Track 1 tasked participants with developing automated methods to probe LLM vulnerabilities by eliciting undesirable responses, effectively testing the limits of existing safety protocols within LLMs. Participants were challenged to devise techniques that could bypass content safeguards across a diverse array of scenarios, from offensive language to misinformation and illegal activities. Through this process, Track 1 aimed to deepen the understanding of LLM vulnerabilities and provide insights for creating more resilient models.

Via

Access Paper or Ask Questions

Boosting Jailbreak Transferability for Large Language Models

Oct 21, 2024

Hanqing Liu, Lifeng Zhou, Huanqian Yan

Figure 1 for Boosting Jailbreak Transferability for Large Language Models

Figure 2 for Boosting Jailbreak Transferability for Large Language Models

Figure 3 for Boosting Jailbreak Transferability for Large Language Models

Figure 4 for Boosting Jailbreak Transferability for Large Language Models

Abstract:Large language models have drawn significant attention to the challenge of safe alignment, especially regarding jailbreak attacks that circumvent security measures to produce harmful content. To address the limitations of existing methods like GCG, which perform well in single-model attacks but lack transferability, we propose several enhancements, including a scenario induction template, optimized suffix selection, and the integration of re-suffix attack mechanism to reduce inconsistent outputs. Our approach has shown superior performance in extensive experiments across various benchmarks, achieving nearly 100% success rates in both attack execution and transferability. Notably, our method has won the online first place in the AISG-hosted Global Challenge for Safe and Secure LLMs.

Via

Access Paper or Ask Questions

Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models

Apr 18, 2024

Shouwei Ruan, Yinpeng Dong, Hanqing Liu, Yao Huang, Hang Su, Xingxing Wei

Figure 1 for Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models

Figure 2 for Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models

Figure 3 for Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models

Figure 4 for Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models

Abstract:Vision-Language Pre-training (VLP) models like CLIP have achieved remarkable success in computer vision and particularly demonstrated superior robustness to distribution shifts of 2D images. However, their robustness under 3D viewpoint variations is still limited, which can hinder the development for real-world applications. This paper successfully addresses this concern while keeping VLPs' original performance by breaking through two primary obstacles: 1) the scarcity of training data and 2) the suboptimal fine-tuning paradigms. To combat data scarcity, we build the Multi-View Caption (MVCap) dataset -- a comprehensive collection of over four million multi-view image-text pairs across more than 100K objects, providing more potential for VLP models to develop generalizable viewpoint-invariant representations. To address the limitations of existing paradigms in performance trade-offs and training efficiency, we design a novel fine-tuning framework named Omniview-Tuning (OVT). Specifically, OVT introduces a Cross-Viewpoint Alignment objective through a minimax-like optimization strategy, which effectively aligns representations of identical objects from diverse viewpoints without causing overfitting. Additionally, OVT fine-tunes VLP models in a parameter-efficient manner, leading to minimal computational cost. Extensive experiments on various VLP models with different architectures validate that OVT significantly improves the models' resilience to viewpoint shifts and keeps the original performance, establishing a pioneering standard for boosting the viewpoint invariance of VLP models.

* 20 pages

Via

Access Paper or Ask Questions

Enriched Physics-informed Neural Networks for Dynamic Poisson-Nernst-Planck Systems

Feb 01, 2024

Xujia Huang, Fajie Wang, Benrong Zhang, Hanqing Liu

Abstract:This paper proposes a meshless deep learning algorithm, enriched physics-informed neural networks (EPINNs), to solve dynamic Poisson-Nernst-Planck (PNP) equations with strong coupling and nonlinear characteristics. The EPINNs takes the traditional physics-informed neural networks as the foundation framework, and adds the adaptive loss weight to balance the loss functions, which automatically assigns the weights of losses by updating the parameters in each iteration based on the maximum likelihood estimate. The resampling strategy is employed in the EPINNs to accelerate the convergence of loss function. Meanwhile, the GPU parallel computing technique is adopted to accelerate the solving process. Four examples are provided to demonstrate the validity and effectiveness of the proposed method. Numerical results indicate that the new method has better applicability than traditional numerical methods in solving such coupled nonlinear systems. More importantly, the EPINNs is more accurate, stable, and fast than the traditional physics-informed neural networks. This work provides a simple and high-performance numerical tool for addressing PNPs with arbitrary boundary shapes and boundary conditions.

* 24 pages, 16 figures, 6 tables

Via

Access Paper or Ask Questions

UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization

Aug 28, 2023

Rui Zhang, Hongxia Wang, Mingshan Du, Hanqing Liu, Yang Zhou, Qiang Zeng

Figure 1 for UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization

Figure 2 for UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization

Figure 3 for UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization

Figure 4 for UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization

Abstract:The emergence of artificial intelligence-generated content (AIGC) has raised concerns about the authenticity of multimedia content in various fields. However, existing research for forgery content detection has focused mainly on binary classification tasks of complete videos, which has limited applicability in industrial settings. To address this gap, we propose UMMAFormer, a novel universal transformer framework for temporal forgery localization (TFL) that predicts forgery segments with multimodal adaptation. Our approach introduces a Temporal Feature Abnormal Attention (TFAA) module based on temporal feature reconstruction to enhance the detection of temporal differences. We also design a Parallel Cross-Attention Feature Pyramid Network (PCA-FPN) to optimize the Feature Pyramid Network (FPN) for subtle feature enhancement. To evaluate the proposed method, we contribute a novel Temporal Video Inpainting Localization (TVIL) dataset specifically tailored for video inpainting scenes. Our experiments show that our approach achieves state-of-the-art performance on benchmark datasets, including Lav-DF, TVIL, and Psynd, significantly outperforming previous methods. The code and data are available at https://github.com/ymhzyj/UMMAFormer/.

* Proceedings of the 31st ACM International Conference on Multimedia (MM '23), October 29-November 3, 2023
* 11 pages, 8 figures, 66 references. This paper has been accepted for ACM MM 2023

Via

Access Paper or Ask Questions

Surrogate-assisted cooperative signal optimization for large-scale traffic networks

Mar 03, 2021

Yongsheng Liang, Zhigang Ren, Lin Wang, Hanqing Liu, Wenhao Du

Figure 1 for Surrogate-assisted cooperative signal optimization for large-scale traffic networks

Figure 2 for Surrogate-assisted cooperative signal optimization for large-scale traffic networks

Figure 3 for Surrogate-assisted cooperative signal optimization for large-scale traffic networks

Figure 4 for Surrogate-assisted cooperative signal optimization for large-scale traffic networks

Abstract:Reasonable setting of traffic signals can be very helpful in alleviating congestion in urban traffic networks. Meta-heuristic optimization algorithms have proved themselves to be able to find high-quality signal timing plans. However, they generally suffer from performance deterioration when solving large-scale traffic signal optimization problems due to the huge search space and limited computational budget. Directing against this issue, this study proposes a surrogate-assisted cooperative signal optimization (SCSO) method. Different from existing methods that directly deal with the entire traffic network, SCSO first decomposes it into a set of tractable sub-networks, and then achieves signal setting by cooperatively optimizing these sub-networks with a surrogate-assisted optimizer. The decomposition operation significantly narrows the search space of the whole traffic network, and the surrogate-assisted optimizer greatly lowers the computational burden by reducing the number of expensive traffic simulations. By taking Newman fast algorithm, radial basis function and a modified estimation of distribution algorithm as decomposer, surrogate model and optimizer, respectively, this study develops a concrete SCSO algorithm. To evaluate its effectiveness and efficiency, a large-scale traffic network involving crossroads and T-junctions is generated based on a real traffic network. Comparison with several existing meta-heuristic algorithms specially designed for traffic signal optimization demonstrates the superiority of SCSO in reducing the average delay time of vehicles.

Via

Access Paper or Ask Questions

A Surrogate-Assisted Variable Grouping Algorithm for General Large Scale Global Optimization Problems

Jan 19, 2021

An Chen, Zhigang Ren, Muyi Wang, Yongsheng Liang, Hanqing Liu, Wenhao Du

Figure 1 for A Surrogate-Assisted Variable Grouping Algorithm for General Large Scale Global Optimization Problems

Figure 2 for A Surrogate-Assisted Variable Grouping Algorithm for General Large Scale Global Optimization Problems

Figure 3 for A Surrogate-Assisted Variable Grouping Algorithm for General Large Scale Global Optimization Problems

Figure 4 for A Surrogate-Assisted Variable Grouping Algorithm for General Large Scale Global Optimization Problems

Abstract:Problem decomposition plays a vital role when applying cooperative coevolution (CC) to large scale global optimization problems. However, most learning-based decomposition algorithms either only apply to additively separable problems or face the issue of false separability detections. Directing against these limitations, this study proposes a novel decomposition algorithm called surrogate-assisted variable grouping (SVG). SVG first designs a general-separability-oriented detection criterion according to whether the optimum of a variable changes with other variables. This criterion is consistent with the separability definition and thus endows SVG with broad applicability and high accuracy. To reduce the fitness evaluation requirement, SVG seeks the optimum of a variable with the help of a surrogate model rather than the original expensive high-dimensional model. Moreover, it converts the variable grouping process into a dynamic-binary-tree search one, which facilitates reutilizing historical separability detection information and thus reducing detection times. To evaluate the performance of SVG, a suite of benchmark functions with up to 2000 dimensions, including additively and non-additively separable ones, were designed. Experimental results on these functions indicate that, compared with six state-of-the-art decomposition algorithms, SVG possesses broader applicability and competitive efficiency. Furthermore, it can significantly enhance the optimization performance of CC.

Via

Access Paper or Ask Questions