Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liquan Mao

CLIP-TNseg: A Multi-Modal Hybrid Framework for Thyroid Nodule Segmentation in Ultrasound Images

Dec 07, 2024

Xinjie Sun, Boxiong Wei, Yalong Jiang, Liquan Mao, Qi Zhao

Figure 1 for CLIP-TNseg: A Multi-Modal Hybrid Framework for Thyroid Nodule Segmentation in Ultrasound Images

Figure 2 for CLIP-TNseg: A Multi-Modal Hybrid Framework for Thyroid Nodule Segmentation in Ultrasound Images

Figure 3 for CLIP-TNseg: A Multi-Modal Hybrid Framework for Thyroid Nodule Segmentation in Ultrasound Images

Figure 4 for CLIP-TNseg: A Multi-Modal Hybrid Framework for Thyroid Nodule Segmentation in Ultrasound Images

Abstract:Thyroid nodule segmentation in ultrasound images is crucial for accurate diagnosis and treatment planning. However, existing methods face challenges in segmentation accuracy, interpretability, and generalization, which hinder their performance. This letter proposes a novel framework, CLIP-TNseg, to address these issues by integrating a multimodal large model with a neural network architecture. CLIP-TNseg consists of two main branches: the Coarse-grained Branch, which extracts high-level semantic features from a frozen CLIP model, and the Fine-grained Branch, which captures fine-grained features using U-Net style residual blocks. These features are fused and processed by the prediction head to generate precise segmentation maps. CLIP-TNseg leverages the Coarse-grained Branch to enhance semantic understanding through textual and high-level visual features, while the Fine-grained Branch refines spatial details, enabling precise and robust segmentation. Extensive experiments on public and our newly collected datasets demonstrate its competitive performance. Our code and the original dataset are available at https://github.com/jayxjsun/CLIP-TNseg.

* 4 pages, 2 figures, submitted to IEEE Signal Processing Letters

Via

Access Paper or Ask Questions

Vision-Language Models Assisted Unsupervised Video Anomaly Detection

Sep 26, 2024

Yalong Jiang, Liquan Mao

Abstract:Video anomaly detection is a subject of great interest across industrial and academic domains due to its crucial role in computer vision applications. However, the inherent unpredictability of anomalies and the scarcity of anomaly samples present significant challenges for unsupervised learning methods. To overcome the limitations of unsupervised learning, which stem from a lack of comprehensive prior knowledge about anomalies, we propose VLAVAD (Video-Language Models Assisted Anomaly Detection). Our method employs a cross-modal pre-trained model that leverages the inferential capabilities of large language models (LLMs) in conjunction with a Selective-Prompt Adapter (SPA) for selecting semantic space. Additionally, we introduce a Sequence State Space Module (S3M) that detects temporal inconsistencies in semantic features. By mapping high-dimensional visual features to low-dimensional semantic ones, our method significantly enhance the interpretability of unsupervised anomaly detection. Our proposed approach effectively tackles the challenge of detecting elusive anomalies that are hard to discern over periods, achieving SOTA on the challenging ShanghaiTech dataset.

Via

Access Paper or Ask Questions