Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yimu Pan

S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Dec 17, 2024

Yimu Pan, Sitao Zhang, Alison D. Gernand, Jeffery A. Goldstein, James Z. Wang

Figure 1 for S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Figure 2 for S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Figure 3 for S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Figure 4 for S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Abstract:Robustness and generalizability in medical image segmentation are often hindered by scarcity and limited diversity of training data, which stands in contrast to the variability encountered during inference. While conventional strategies -- such as domain-specific augmentation, specialized architectures, and tailored training procedures -- can alleviate these issues, they depend on the availability and reliability of domain knowledge. When such knowledge is unavailable, misleading, or improperly applied, performance may deteriorate. In response, we introduce a novel, domain-agnostic, add-on, and data-driven strategy inspired by image stacking in image denoising. Termed ``semantic stacking,'' our method estimates a denoised semantic representation that complements the conventional segmentation loss during training. This method does not depend on domain-specific assumptions, making it broadly applicable across diverse image modalities, model architectures, and augmentation techniques. Through extensive experiments, we validate the superiority of our approach in improving segmentation performance under diverse conditions. Code is available at https://github.com/ymp5078/Semantic-Stacking.

* AAAI2025

Via

Access Paper or Ask Questions

AI-SAM: Automatic and Interactive Segment Anything Model

Dec 05, 2023

Yimu Pan, Sitao Zhang, Alison D. Gernand, Jeffery A. Goldstein, James Z. Wang

Figure 1 for AI-SAM: Automatic and Interactive Segment Anything Model

Figure 2 for AI-SAM: Automatic and Interactive Segment Anything Model

Figure 3 for AI-SAM: Automatic and Interactive Segment Anything Model

Figure 4 for AI-SAM: Automatic and Interactive Segment Anything Model

Abstract:Semantic segmentation is a core task in computer vision. Existing methods are generally divided into two categories: automatic and interactive. Interactive approaches, exemplified by the Segment Anything Model (SAM), have shown promise as pre-trained models. However, current adaptation strategies for these models tend to lean towards either automatic or interactive approaches. Interactive methods depend on prompts user input to operate, while automatic ones bypass the interactive promptability entirely. Addressing these limitations, we introduce a novel paradigm and its first model: the Automatic and Interactive Segment Anything Model (AI-SAM). In this paradigm, we conduct a comprehensive analysis of prompt quality and introduce the pioneering Automatic and Interactive Prompter (AI-Prompter) that automatically generates initial point prompts while accepting additional user inputs. Our experimental results demonstrate AI-SAM's effectiveness in the automatic setting, achieving state-of-the-art performance. Significantly, it offers the flexibility to incorporate additional user prompts, thereby further enhancing its performance. The project page is available at https://github.com/ymp5078/AI-SAM.

* 11 pages, 9 figures

Via

Access Paper or Ask Questions

Learning Emotion Representations from Verbal and Nonverbal Communication

May 22, 2023

Sitao Zhang, Yimu Pan, James Z. Wang

Figure 1 for Learning Emotion Representations from Verbal and Nonverbal Communication

Figure 2 for Learning Emotion Representations from Verbal and Nonverbal Communication

Figure 3 for Learning Emotion Representations from Verbal and Nonverbal Communication

Figure 4 for Learning Emotion Representations from Verbal and Nonverbal Communication

Abstract:Emotion understanding is an essential but highly challenging component of artificial general intelligence. The absence of extensively annotated datasets has significantly impeded advancements in this field. We present EmotionCLIP, the first pre-training paradigm to extract visual emotion representations from verbal and nonverbal communication using only uncurated data. Compared to numerical labels or descriptions used in previous methods, communication naturally contains emotion information. Furthermore, acquiring emotion representations from communication is more congruent with the human learning process. We guide EmotionCLIP to attend to nonverbal emotion cues through subject-aware context encoding and verbal emotion cues using sentiment-guided contrastive learning. Extensive experiments validate the effectiveness and transferability of EmotionCLIP. Using merely linear-probe evaluation protocol, EmotionCLIP outperforms the state-of-the-art supervised visual emotion recognition methods and rivals many multimodal approaches across various benchmarks. We anticipate that the advent of EmotionCLIP will address the prevailing issue of data scarcity in emotion understanding, thereby fostering progress in related domains. The code and pre-trained models are available at https://github.com/Xeaver/EmotionCLIP.

* CVPR 2023

Via

Access Paper or Ask Questions

Learning to Adapt to Online Streams with Distribution Shifts

Mar 02, 2023

Chenyan Wu, Yimu Pan, Yandong Li, James Z. Wang

Figure 1 for Learning to Adapt to Online Streams with Distribution Shifts

Figure 2 for Learning to Adapt to Online Streams with Distribution Shifts

Figure 3 for Learning to Adapt to Online Streams with Distribution Shifts

Figure 4 for Learning to Adapt to Online Streams with Distribution Shifts

Abstract:Test-time adaptation (TTA) is a technique used to reduce distribution gaps between the training and testing sets by leveraging unlabeled test data during inference. In this work, we expand TTA to a more practical scenario, where the test data comes in the form of online streams that experience distribution shifts over time. Existing approaches face two challenges: reliance on a large test data batch from the same domain and the absence of explicitly modeling the continual distribution evolution process. To address both challenges, we propose a meta-learning approach that teaches the network to adapt to distribution-shifting online streams during meta-training. As a result, the trained model can perform continual adaptation to distribution shifts in testing, regardless of the batch size restriction, as it has learned during training. We conducted extensive experiments on benchmarking datasets for TTA, incorporating a broad range of online distribution-shifting settings. Our results showed consistent improvements over state-of-the-art methods, indicating the effectiveness of our approach. In addition, we achieved superior performance in the video segmentation task, highlighting the potential of our method for real-world applications.

Via

Access Paper or Ask Questions