Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mun Yong Yi

Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling

May 23, 2025

Bryan Wong, Jong Woo Kim, Huazhu Fu, Mun Yong Yi

Abstract:Vision-language models (VLMs) have recently been integrated into multiple instance learning (MIL) frameworks to address the challenge of few-shot, weakly supervised classification of whole slide images (WSIs). A key trend involves leveraging multi-scale information to better represent hierarchical tissue structures. However, existing methods often face two key limitations: (1) insufficient modeling of interactions within the same modalities across scales (e.g., 5x and 20x) and (2) inadequate alignment between visual and textual modalities on the same scale. To address these gaps, we propose HiVE-MIL, a hierarchical vision-language framework that constructs a unified graph consisting of (1) parent-child links between coarse (5x) and fine (20x) visual/textual nodes to capture hierarchical relationships, and (2) heterogeneous intra-scale edges linking visual and textual nodes on the same scale. To further enhance semantic consistency, HiVE-MIL incorporates a two-stage, text-guided dynamic filtering mechanism that removes weakly correlated patch-text pairs, and introduces a hierarchical contrastive loss to align textual semantics across scales. Extensive experiments on TCGA breast, lung, and kidney cancer datasets demonstrate that HiVE-MIL consistently outperforms both traditional MIL and recent VLM-based MIL approaches, achieving gains of up to 4.1% in macro F1 under 16-shot settings. Our results demonstrate the value of jointly modeling hierarchical structure and multimodal alignment for efficient and scalable learning from limited pathology data. The code is available at https://github.com/bryanwong17/HiVE-MIL

Via

Access Paper or Ask Questions

Rethinking LLM-Based Recommendations: A Query Generation-Based, Training-Free Approach

Apr 16, 2025

Donghee Han, Hwanjun Song, Mun Yong Yi

Abstract:Existing large language model LLM-based recommendation methods face several challenges, including inefficiency in handling large candidate pools, sensitivity to item order within prompts ("lost in the middle" phenomenon) poor scalability, and unrealistic evaluation due to random negative sampling. To address these issues, we propose a Query-to-Recommendation approach that leverages LLMs to generate personalized queries for retrieving relevant items from the entire candidate pool, eliminating the need for candidate pre-selection. This method can be integrated into an ID-based recommendation system without additional training, enhances recommendation performance and diversity through LLMs' world knowledge, and performs well even for less popular item groups. Experiments on three datasets show up to 57 percent improvement, with an average gain of 31 percent, demonstrating strong zero-shot performance and further gains when ensembled with existing models.

Via

Access Paper or Ask Questions

Spatial Context-Driven Positive Pair Sampling for Enhanced Histopathology Image Classification

Mar 07, 2025

Willmer Rafell Quinones Robles, Sakonporn Noree, Young Sin Ko, Bryan Wong, Jongwoo Kim, Mun Yong Yi

Abstract:Deep learning has demonstrated great promise in cancer classification from whole-slide images (WSIs) but remains constrained by the need for extensive annotations. Annotation-free methods, such as multiple instance learning (MIL) and self-supervised learning (SSL), have emerged to address this challenge; however, current SSL techniques often depend on synthetic augmentations or temporal context, which may not adequately capture the intricate spatial relationships inherent to histopathology. In this work, we introduce a novel spatial context-driven positive pair sampling strategy for SSL that leverages the natural coherence of adjacent patches in WSIs. By constructing biologically relevant positive pairs from spatially proximate patches, our approach harnesses inherent spatial coherence to enhance patch-level representations, ultimately boosting slide-level classification performance. Experiments on multiple datasets reveal that our strategy improves classification accuracy by 5\% to 10\% over the standard method, paving the way for more clinically relevant AI models in cancer diagnosis. The code is available at https://anonymous.4open.science/r/contextual-pairs-E72F/.

Via

Access Paper or Ask Questions

PreMix: Boosting Multiple Instance Learning in Digital Histopathology through Pre-training with Intra-Batch Slide Mixing

Aug 02, 2024

Bryan Wong, Mun Yong Yi

Abstract:The classification of gigapixel-sized whole slide images (WSIs), digital representations of histological slides obtained via a high-resolution scanner, faces significant challenges associated with the meticulous and time-consuming nature of fine-grained labeling. While weakly-supervised multiple instance learning (MIL) has emerged as a promising approach, current MIL methods are constrained by their limited ability to leverage the wealth of information embedded within unlabeled WSIs. This limitation often necessitates training MIL feature aggregators from scratch after the feature extraction process, hindering efficiency and accuracy. PreMix extends the general MIL framework by pre-training the MIL aggregator with an intra-batch slide mixing approach. Specifically, PreMix incorporates Barlow Twins Slide Mixing during pre-training, enhancing its ability to handle diverse WSI sizes and maximizing the utility of unlabeled WSIs. Combined with Mixup and Manifold Mixup during fine-tuning, PreMix achieves a mean of 4.7% performance improvement over the baseline MIL framework, the hierarchical image pyramid transformer (HIPT), on the Camelyon16 dataset. The observed improvement across a range of active learning acquisition functions and WSI-labeled training budgets highlights the framework's adaptability to diverse datasets and varying resource constraints. Ultimately, PreMix paves the way for more efficient and accurate WSI classification under limited WSI-labeled datasets, encouraging the broader adoption of unlabeled WSI data in histopathological research. The code is available at https://anonymous.4open.science/r/PreMix

* 15 pages

Via

Access Paper or Ask Questions

Rethinking Pre-trained Feature Extractor Selection in Multiple Instance Learning for Whole Slide Image Classification

Aug 02, 2024

Bryan Wong, Mun Yong Yi

Abstract:Multiple instance learning (MIL) has become a preferred method for classifying gigapixel whole slide images (WSIs), without requiring patch label annotation. The focus of the current MIL research stream is on the embedding-based MIL approach, which involves extracting feature vectors from patches using a pre-trained feature extractor. These feature vectors are then fed into an MIL aggregator for slide-level prediction. Despite prior research suggestions on enhancing the most commonly used ResNet50 supervised model pre-trained on ImageNet-1K, there remains a lack of clear guidance on selecting the optimal feature extractor to maximize WSI performance. This study aims at addressing this gap by examining MIL feature extractors across three dimensions: pre-training dataset, backbone model, and pre-training method. Extensive experiments were carried out on the two public WSI datasets (TCGA-NSCLC and Camelyon16) using four SOTA MIL models. The main findings indicate the following: 1) Performance significantly improves with larger and more varied pre-training datasets in both CNN and Transformer backbones. 2) `Modern and deeper' backbones greatly outperform `standard' backbones (ResNet and ViT), with performance improvements more guaranteed in Transformer-based backbones. 3) The choice of self-supervised learning (SSL) method is crucial, with the most significant benefits observed when applied to the Transformer (ViT) backbone. The study findings have practical implications, including designing more effective pathological foundation models. Our code is available at: https://anonymous.4open.science/r/MIL-Feature-Extractor-Selection

* 12 pages

Via

Access Paper or Ask Questions