Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sitao Zhang

S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Dec 17, 2024

Yimu Pan, Sitao Zhang, Alison D. Gernand, Jeffery A. Goldstein, James Z. Wang

Figure 1 for S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Figure 2 for S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Figure 3 for S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Figure 4 for S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Abstract:Robustness and generalizability in medical image segmentation are often hindered by scarcity and limited diversity of training data, which stands in contrast to the variability encountered during inference. While conventional strategies -- such as domain-specific augmentation, specialized architectures, and tailored training procedures -- can alleviate these issues, they depend on the availability and reliability of domain knowledge. When such knowledge is unavailable, misleading, or improperly applied, performance may deteriorate. In response, we introduce a novel, domain-agnostic, add-on, and data-driven strategy inspired by image stacking in image denoising. Termed ``semantic stacking,'' our method estimates a denoised semantic representation that complements the conventional segmentation loss during training. This method does not depend on domain-specific assumptions, making it broadly applicable across diverse image modalities, model architectures, and augmentation techniques. Through extensive experiments, we validate the superiority of our approach in improving segmentation performance under diverse conditions. Code is available at https://github.com/ymp5078/Semantic-Stacking.

* AAAI2025

Via

Access Paper or Ask Questions

AI-SAM: Automatic and Interactive Segment Anything Model

Dec 05, 2023

Yimu Pan, Sitao Zhang, Alison D. Gernand, Jeffery A. Goldstein, James Z. Wang

Figure 1 for AI-SAM: Automatic and Interactive Segment Anything Model

Figure 2 for AI-SAM: Automatic and Interactive Segment Anything Model

Figure 3 for AI-SAM: Automatic and Interactive Segment Anything Model

Figure 4 for AI-SAM: Automatic and Interactive Segment Anything Model

Abstract:Semantic segmentation is a core task in computer vision. Existing methods are generally divided into two categories: automatic and interactive. Interactive approaches, exemplified by the Segment Anything Model (SAM), have shown promise as pre-trained models. However, current adaptation strategies for these models tend to lean towards either automatic or interactive approaches. Interactive methods depend on prompts user input to operate, while automatic ones bypass the interactive promptability entirely. Addressing these limitations, we introduce a novel paradigm and its first model: the Automatic and Interactive Segment Anything Model (AI-SAM). In this paradigm, we conduct a comprehensive analysis of prompt quality and introduce the pioneering Automatic and Interactive Prompter (AI-Prompter) that automatically generates initial point prompts while accepting additional user inputs. Our experimental results demonstrate AI-SAM's effectiveness in the automatic setting, achieving state-of-the-art performance. Significantly, it offers the flexibility to incorporate additional user prompts, thereby further enhancing its performance. The project page is available at https://github.com/ymp5078/AI-SAM.

* 11 pages, 9 figures

Via

Access Paper or Ask Questions

Learning Emotion Representations from Verbal and Nonverbal Communication

May 22, 2023

Sitao Zhang, Yimu Pan, James Z. Wang

Figure 1 for Learning Emotion Representations from Verbal and Nonverbal Communication

Figure 2 for Learning Emotion Representations from Verbal and Nonverbal Communication

Figure 3 for Learning Emotion Representations from Verbal and Nonverbal Communication

Figure 4 for Learning Emotion Representations from Verbal and Nonverbal Communication

Abstract:Emotion understanding is an essential but highly challenging component of artificial general intelligence. The absence of extensively annotated datasets has significantly impeded advancements in this field. We present EmotionCLIP, the first pre-training paradigm to extract visual emotion representations from verbal and nonverbal communication using only uncurated data. Compared to numerical labels or descriptions used in previous methods, communication naturally contains emotion information. Furthermore, acquiring emotion representations from communication is more congruent with the human learning process. We guide EmotionCLIP to attend to nonverbal emotion cues through subject-aware context encoding and verbal emotion cues using sentiment-guided contrastive learning. Extensive experiments validate the effectiveness and transferability of EmotionCLIP. Using merely linear-probe evaluation protocol, EmotionCLIP outperforms the state-of-the-art supervised visual emotion recognition methods and rivals many multimodal approaches across various benchmarks. We anticipate that the advent of EmotionCLIP will address the prevailing issue of data scarcity in emotion understanding, thereby fostering progress in related domains. The code and pre-trained models are available at https://github.com/Xeaver/EmotionCLIP.

* CVPR 2023

Via

Access Paper or Ask Questions