Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:AnoVL: Adapting Vision-Language Models for Unified Zero-shot Anomaly Localization

Aug 30, 2023

Hanqiu Deng, Zhaoxiang Zhang, Jinan Bao, Xingyu Li

Figure 1 for AnoVL: Adapting Vision-Language Models for Unified Zero-shot Anomaly Localization

Figure 2 for AnoVL: Adapting Vision-Language Models for Unified Zero-shot Anomaly Localization

Figure 3 for AnoVL: Adapting Vision-Language Models for Unified Zero-shot Anomaly Localization

Figure 4 for AnoVL: Adapting Vision-Language Models for Unified Zero-shot Anomaly Localization

Share this with someone who'll enjoy it:

Abstract:Contrastive Language-Image Pre-training (CLIP) models have shown promising performance on zero-shot visual recognition tasks by learning visual representations under natural language supervision. Recent studies attempt the use of CLIP to tackle zero-shot anomaly detection by matching images with normal and abnormal state prompts. However, since CLIP focuses on building correspondence between paired text prompts and global image-level representations, the lack of patch-level vision to text alignment limits its capability on precise visual anomaly localization. In this work, we introduce a training-free adaptation (TFA) framework of CLIP for zero-shot anomaly localization. In the visual encoder, we innovate a training-free value-wise attention mechanism to extract intrinsic local tokens of CLIP for patch-level local description. From the perspective of text supervision, we particularly design a unified domain-aware contrastive state prompting template. On top of the proposed TFA, we further introduce a test-time adaptation (TTA) mechanism to refine anomaly localization results, where a layer of trainable parameters in the adapter is optimized using TFA's pseudo-labels and synthetic noise-corrupted tokens. With both TFA and TTA adaptation, we significantly exploit the potential of CLIP for zero-shot anomaly localization and demonstrate the effectiveness of our proposed methods on various datasets.

View paper on

Share this with someone who'll enjoy it:

Title:AnoVL: Adapting Vision-Language Models for Unified Zero-shot Anomaly Localization

Paper and Code