Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liangli Zhen

Memory-Augmented Dual-Decoder Networks for Multi-Class Unsupervised Anomaly Detection

Apr 21, 2025

Jingyu Xing, Chenwei Tang, Tao Wang, Rong Xiao, Wei Ju, Ji-Zhe Zhou, Liangli Zhen, Jiancheng Lv

Abstract:Recent advances in unsupervised anomaly detection (UAD) have shifted from single-class to multi-class scenarios. In such complex contexts, the increasing pattern diversity has brought two challenges to reconstruction-based approaches: (1) over-generalization: anomalies that are subtle or share compositional similarities with normal patterns may be reconstructed with high fidelity, making them difficult to distinguish from normal instances; and (2) insufficient normality reconstruction: complex normal features, such as intricate textures or fine-grained structures, may not be faithfully reconstructed due to the model's limited representational capacity, resulting in false positives. Existing methods typically focus on addressing the former, which unintentionally exacerbate the latter, resulting in inadequate representation of intricate normal patterns. To concurrently address these two challenges, we propose a Memory-augmented Dual-Decoder Networks (MDD-Net). This network includes two critical components: a Dual-Decoder Reverse Distillation Network (DRD-Net) and a Class-aware Memory Module (CMM). Specifically, the DRD-Net incorporates a restoration decoder designed to recover normal features from synthetic abnormal inputs and an identity decoder to reconstruct features that maintain the anomalous semantics. By exploiting the discrepancy between features produced by two decoders, our approach refines anomaly scores beyond the conventional encoder-decoder comparison paradigm, effectively reducing false positives and enhancing localization accuracy. Furthermore, the CMM explicitly encodes and preserves class-specific normal prototypes, actively steering the network away from anomaly reconstruction. Comprehensive experimental results across several benchmarks demonstrate the superior performance of our MDD-Net framework over current SoTA approaches in multi-class UAD tasks.

Via

Access Paper or Ask Questions

Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering

Mar 26, 2025

Zehui Liao, Shishuai Hu, Ke Zou, Huazhu Fu, Liangli Zhen, Yong Xia

Abstract:Multimodal large language models (MLLMs) have demonstrated significant potential in medical Visual Question Answering (VQA). Yet, they remain prone to hallucinations-incorrect responses that contradict input images, posing substantial risks in clinical decision-making. Detecting these hallucinations is essential for establishing trust in MLLMs among clinicians and patients, thereby enabling their real-world adoption. Current hallucination detection methods, especially semantic entropy (SE), have demonstrated promising hallucination detection capacity for LLMs. However, adapting SE to medical MLLMs by incorporating visual perturbations presents a dilemma. Weak perturbations preserve image content and ensure clinical validity, but may be overlooked by medical MLLMs, which tend to over rely on language priors. In contrast, strong perturbations can distort essential diagnostic features, compromising clinical interpretation. To address this issue, we propose Vision Amplified Semantic Entropy (VASE), which incorporates weak image transformations and amplifies the impact of visual input, to improve hallucination detection in medical VQA. We first estimate the semantic predictive distribution under weak visual transformations to preserve clinical validity, and then amplify visual influence by contrasting this distribution with that derived from a distorted image. The entropy of the resulting distribution is estimated as VASE. Experiments on two medical open-ended VQA datasets demonstrate that VASE consistently outperforms existing hallucination detection methods.

* 11 pages, 2 figures

Via

Access Paper or Ask Questions

Global Challenge for Safe and Secure LLMs Track 1

Nov 21, 2024

Xiaojun Jia, Yihao Huang, Yang Liu, Peng Yan Tan, Weng Kuan Yau, Mun-Thye Mak, Xin Ming Sim, Wee Siong Ng, See Kiong Ng, Hanqing Liu(+20 more)

Figure 1 for Global Challenge for Safe and Secure LLMs Track 1

Figure 2 for Global Challenge for Safe and Secure LLMs Track 1

Figure 3 for Global Challenge for Safe and Secure LLMs Track 1

Figure 4 for Global Challenge for Safe and Secure LLMs Track 1

Abstract:This paper introduces the Global Challenge for Safe and Secure Large Language Models (LLMs), a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) to foster the development of advanced defense mechanisms against automated jailbreaking attacks. With the increasing integration of LLMs in critical sectors such as healthcare, finance, and public administration, ensuring these models are resilient to adversarial attacks is vital for preventing misuse and upholding ethical standards. This competition focused on two distinct tracks designed to evaluate and enhance the robustness of LLM security frameworks. Track 1 tasked participants with developing automated methods to probe LLM vulnerabilities by eliciting undesirable responses, effectively testing the limits of existing safety protocols within LLMs. Participants were challenged to devise techniques that could bypass content safeguards across a diverse array of scenarios, from offensive language to misinformation and illegal activities. Through this process, Track 1 aimed to deepen the understanding of LLM vulnerabilities and provide insights for creating more resilient models.

Via

Access Paper or Ask Questions

A Survey and Evaluation of Adversarial Attacks for Object Detection

Aug 06, 2024

Khoi Nguyen Tiet Nguyen, Wenyu Zhang, Kangkang Lu, Yuhuan Wu, Xingjian Zheng, Hui Li Tan, Liangli Zhen

Abstract:Deep learning models excel in various computer vision tasks but are susceptible to adversarial examples-subtle perturbations in input data that lead to incorrect predictions. This vulnerability poses significant risks in safety-critical applications such as autonomous vehicles, security surveillance, and aircraft health monitoring. While numerous surveys focus on adversarial attacks in image classification, the literature on such attacks in object detection is limited. This paper offers a comprehensive taxonomy of adversarial attacks specific to object detection, reviews existing adversarial robustness evaluation metrics, and systematically assesses open-source attack methods and model robustness. Key observations are provided to enhance the understanding of attack effectiveness and corresponding countermeasures. Additionally, we identify crucial research challenges to guide future efforts in securing automated object detection systems.

* 14 pages

Via

Access Paper or Ask Questions

Low-Resolution Self-Attention for Semantic Segmentation

Oct 08, 2023

Yu-Huan Wu, Shi-Chen Zhang, Yun Liu, Le Zhang, Xin Zhan, Daquan Zhou, Jiashi Feng, Ming-Ming Cheng, Liangli Zhen

Abstract:Semantic segmentation tasks naturally require high-resolution information for pixel-wise segmentation and global context information for class prediction. While existing vision transformers demonstrate promising performance, they often utilize high resolution context modeling, resulting in a computational bottleneck. In this work, we challenge conventional wisdom and introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost. Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution, with additional 3x3 depth-wise convolutions to capture fine details in the high-resolution space. We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure. Extensive experiments on the ADE20K, COCO-Stuff, and Cityscapes datasets demonstrate that LRFormer outperforms state-of-the-art models. The code will be made available at https://github.com/yuhuan-wu/LRFormer.

* 11 pages, 11 tables, 6 figures

Via

Access Paper or Ask Questions

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Oct 07, 2021

Jiawei Du, Hanshu Yan, Jiashi Feng, Joey Tianyi Zhou, Liangli Zhen, Rick Siow Mong Goh, Vincent Y. F. Tan

Figure 1 for Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Figure 2 for Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Figure 3 for Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Figure 4 for Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Abstract:Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentially result in severe generalization error. Recently, the relation between the sharpness of the loss landscape and the generalization error has been established by Foret et al. (2020), in which the Sharpness Aware Minimizer (SAM) was proposed to mitigate the degradation of the generalization. Unfortunately, SAM s computational cost is roughly double that of base optimizers, such as Stochastic Gradient Descent (SGD). This paper thus proposes Efficient Sharpness Aware Minimizer (ESAM), which boosts SAM s efficiency at no cost to its generalization performance. ESAM includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. In the former, the sharpness measure is approximated by perturbing a stochastically chosen set of weights in each iteration; in the latter, the SAM loss is optimized using only a judiciously selected subset of data that is sensitive to the sharpness. We provide theoretical explanations as to why these strategies perform well. We also show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis base optimizers, while test accuracies are preserved or even improved.

Via

Access Paper or Ask Questions

Parallel Attention Network with Sequence Matching for Video Grounding

May 18, 2021

Hao Zhang, Aixin Sun, Wei Jing, Liangli Zhen, Joey Tianyi Zhou, Rick Siow Mong Goh

Figure 1 for Parallel Attention Network with Sequence Matching for Video Grounding

Figure 2 for Parallel Attention Network with Sequence Matching for Video Grounding

Figure 3 for Parallel Attention Network with Sequence Matching for Video Grounding

Figure 4 for Parallel Attention Network with Sequence Matching for Video Grounding

Abstract:Given a video, video grounding aims to retrieve a temporal moment that semantically corresponds to a language query. In this work, we propose a Parallel Attention Network with Sequence matching (SeqPAN) to address the challenges in this task: multi-modal representation learning, and target moment boundary prediction. We design a self-guided parallel attention module to effectively capture self-modal contexts and cross-modal attentive information between video and text. Inspired by sequence labeling tasks in natural language processing, we split the ground truth moment into begin, inside, and end regions. We then propose a sequence matching strategy to guide start/end boundary predictions using region labels. Experimental results on three datasets show that SeqPAN is superior to state-of-the-art methods. Furthermore, the effectiveness of the self-guided parallel attention module and the sequence matching module is verified.

* 15 pages, 10 figures, 7 tables, Findings at ACL 2021

Via

Access Paper or Ask Questions

Video Corpus Moment Retrieval with Contrastive Learning

May 13, 2021

Hao Zhang, Aixin Sun, Wei Jing, Guoshun Nan, Liangli Zhen, Joey Tianyi Zhou, Rick Siow Mong Goh

Figure 1 for Video Corpus Moment Retrieval with Contrastive Learning

Figure 2 for Video Corpus Moment Retrieval with Contrastive Learning

Figure 3 for Video Corpus Moment Retrieval with Contrastive Learning

Figure 4 for Video Corpus Moment Retrieval with Contrastive Learning

Abstract:Given a collection of untrimmed and unsegmented videos, video corpus moment retrieval (VCMR) is to retrieve a temporal moment (i.e., a fraction of a video) that semantically corresponds to a given text query. As video and text are from two distinct feature spaces, there are two general approaches to address VCMR: (i) to separately encode each modality representations, then align the two modality representations for query processing, and (ii) to adopt fine-grained cross-modal interaction to learn multi-modal representations for query processing. While the second approach often leads to better retrieval accuracy, the first approach is far more efficient. In this paper, we propose a Retrieval and Localization Network with Contrastive Learning (ReLoCLNet) for VCMR. We adopt the first approach and introduce two contrastive learning objectives to refine video encoder and text encoder to learn video and text representations separately but with better alignment for VCMR. The video contrastive learning (VideoCL) is to maximize mutual information between query and candidate video at video-level. The frame contrastive learning (FrameCL) aims to highlight the moment region corresponds to the query at frame-level, within a video. Experimental results show that, although ReLoCLNet encodes text and video separately for efficiency, its retrieval accuracy is comparable with baselines adopting cross-modal interaction learning.

* 11 pages, 7 figures and 6 tables. Accepted by SIGIR 2021

Via

Access Paper or Ask Questions

Natural Language Video Localization: A Revisit in Span-based Question Answering Framework

Mar 02, 2021

Hao Zhang, Aixin Sun, Wei Jing, Liangli Zhen, Joey Tianyi Zhou, Rick Siow Mong Goh

Figure 1 for Natural Language Video Localization: A Revisit in Span-based Question Answering Framework

Figure 2 for Natural Language Video Localization: A Revisit in Span-based Question Answering Framework

Figure 3 for Natural Language Video Localization: A Revisit in Span-based Question Answering Framework

Figure 4 for Natural Language Video Localization: A Revisit in Span-based Question Answering Framework

Abstract:Natural Language Video Localization (NLVL) aims to locate a target moment from an untrimmed video that semantically corresponds to a text query. Existing approaches mainly solve the NLVL problem from the perspective of computer vision by formulating it as ranking, anchor, or regression tasks. These methods suffer from large performance degradation when localizing on long videos. In this work, we address the NLVL from a new perspective, i.e., span-based question answering (QA), by treating the input video as a text passage. We propose a video span localizing network (VSLNet), on top of the standard span-based QA framework (named VSLBase), to address NLVL. VSLNet tackles the differences between NLVL and span-based QA through a simple yet effective query-guided highlighting (QGH) strategy. QGH guides VSLNet to search for the matching video span within a highlighted region. To address the performance degradation on long videos, we further extend VSLNet to VSLNet-L by applying a multi-scale split-and-concatenation strategy. VSLNet-L first splits the untrimmed video into short clip segments; then, it predicts which clip segment contains the target moment and suppresses the importance of other segments. Finally, the clip segments are concatenated, with different confidences, to locate the target moment accurately. Extensive experiments on three benchmark datasets show that the proposed VSLNet and VSLNet-L outperform the state-of-the-art methods; VSLNet-L addresses the issue of performance degradation on long videos. Our study suggests that the span-based QA framework is an effective strategy to solve the NLVL problem.

* IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021
* 15 pages, 18 figures, and 10 tables. Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). arXiv admin note: substantial text overlap with arXiv:2004.13931

Via

Access Paper or Ask Questions

Multiobjective Test Problems with Degenerate Pareto Fronts

Jun 07, 2018

Liangli Zhen, Miqing Li, Ran Cheng, Dezhong Peng, Xin Yao

Figure 1 for Multiobjective Test Problems with Degenerate Pareto Fronts

Figure 2 for Multiobjective Test Problems with Degenerate Pareto Fronts

Figure 3 for Multiobjective Test Problems with Degenerate Pareto Fronts

Figure 4 for Multiobjective Test Problems with Degenerate Pareto Fronts

Abstract:In multiobjective optimization, a set of scalable test problems with a variety of features allows researchers to investigate and evaluate abilities of different optimization algorithms, and thus can help them to design and develop more effective and efficient approaches. Existing, commonly-used test problem suites are mainly focused on the situations where all the objectives are conflicting with each other. However, in some many-objective optimization problems, there may be unexpected characteristics among objectives, e.g., redundancy. This leads to a degenerate problem. In this paper, we systematically study degenerate problems. We abstract three generic characteristics of degenerate problems, and on the basis of these characteristics we present a set of test problems, in order to support the investigation of multiobjective search algorithms on problems with redundant objectives. To assess the proposed test problems, ten representative multiobjective evolutionary algorithms are tested. The results indicate that none of the tested algorithms is able to effectively solve these proposed problems, calling for the need of developing new approaches to addressing degenerate multi-objective problems.

* 20 pages, 8 figures

Via

Access Paper or Ask Questions