Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yipei Wang

Promptable cancer segmentation using minimal expert-curated data

May 23, 2025

Lynn Karam, Yipei Wang, Veeru Kasivisvanathan, Mirabela Rusu, Yipeng Hu, Shaheer U. Saeed

Abstract:Automated segmentation of cancer on medical images can aid targeted diagnostic and therapeutic procedures. However, its adoption is limited by the high cost of expert annotations required for training and inter-observer variability in datasets. While weakly-supervised methods mitigate some challenges, using binary histology labels for training as opposed to requiring full segmentation, they require large paired datasets of histology and images, which are difficult to curate. Similarly, promptable segmentation aims to allow segmentation with no re-training for new tasks at inference, however, existing models perform poorly on pathological regions, again necessitating large datasets for training. In this work we propose a novel approach for promptable segmentation requiring only 24 fully-segmented images, supplemented by 8 weakly-labelled images, for training. Curating this minimal data to a high standard is relatively feasible and thus issues with the cost and variability of obtaining labels can be mitigated. By leveraging two classifiers, one weakly-supervised and one fully-supervised, our method refines segmentation through a guided search process initiated by a single-point prompt. Our approach outperforms existing promptable segmentation methods, and performs comparably with fully-supervised methods, for the task of prostate cancer segmentation, while using substantially less annotated data (up to 100X less). This enables promptable segmentation with very minimal labelled data, such that the labels can be curated to a very high standard.

* Accepted at Medical Image Understanding and Analysis (MIUA) 2025

Via

Access Paper or Ask Questions

Tell2Reg: Establishing spatial correspondence between images by the same language prompts

Feb 05, 2025

Wen Yan, Qianye Yang, Shiqi Huang, Yipei Wang, Shonit Punwani, Mark Emberton, Vasilis Stavrinides, Yipeng Hu, Dean Barratt

Abstract:Spatial correspondence can be represented by pairs of segmented regions, such that the image registration networks aim to segment corresponding regions rather than predicting displacement fields or transformation parameters. In this work, we show that such a corresponding region pair can be predicted by the same language prompt on two different images using the pre-trained large multimodal models based on GroundingDINO and SAM. This enables a fully automated and training-free registration algorithm, potentially generalisable to a wide range of image registration tasks. In this paper, we present experimental results using one of the challenging tasks, registering inter-subject prostate MR images, which involves both highly variable intensity and morphology between patients. Tell2Reg is training-free, eliminating the need for costly and time-consuming data curation and labelling that was previously required for this registration task. This approach outperforms unsupervised learning-based registration methods tested, and has a performance comparable to weakly-supervised methods. Additional qualitative results are also presented to suggest that, for the first time, there is a potential correlation between language semantics and spatial correspondence, including the spatial invariance in language-prompted regions and the difference in language prompts between the obtained local and global correspondences. Code is available at https://github.com/yanwenCi/Tell2Reg.git.

* 5 pages, 3 figures, conference paper

Via

Access Paper or Ask Questions

T2-Only Prostate Cancer Prediction by Meta-Learning from Bi-Parametric MR Imaging

Nov 11, 2024

Weixi Yi, Yipei Wang, Natasha Thorley, Alexander Ng, Shonit Punwani, Veeru Kasivisvanathan, Dean C. Barratt, Shaheer Ullah Saeed, Yipeng Hu

Figure 1 for T2-Only Prostate Cancer Prediction by Meta-Learning from Bi-Parametric MR Imaging

Figure 2 for T2-Only Prostate Cancer Prediction by Meta-Learning from Bi-Parametric MR Imaging

Figure 3 for T2-Only Prostate Cancer Prediction by Meta-Learning from Bi-Parametric MR Imaging

Figure 4 for T2-Only Prostate Cancer Prediction by Meta-Learning from Bi-Parametric MR Imaging

Abstract:Current imaging-based prostate cancer diagnosis requires both MR T2-weighted (T2w) and diffusion-weighted imaging (DWI) sequences, with additional sequences for potentially greater accuracy improvement. However, measuring diffusion patterns in DWI sequences can be time-consuming, prone to artifacts and sensitive to imaging parameters. While machine learning (ML) models have demonstrated radiologist-level accuracy in detecting prostate cancer from these two sequences, this study investigates the potential of ML-enabled methods using only the T2w sequence as input during inference time. We first discuss the technical feasibility of such a T2-only approach, and then propose a novel ML formulation, where DWI sequences - readily available for training purposes - are only used to train a meta-learning model, which subsequently only uses T2w sequences at inference. Using multiple datasets from more than 3,000 prostate cancer patients, we report superior or comparable performance in localising radiologist-identified prostate cancer using our proposed T2-only models, compared with alternative models using T2-only or both sequences as input. Real patient cases are presented and discussed to demonstrate, for the first time, the exclusively true-positive cases from models with different input sequences.

* Code: https://github.com/wxyi057/MetaT2

Via

Access Paper or Ask Questions

AI-assisted prostate cancer detection and localisation on biparametric MR by classifying radiologist-positives

Oct 30, 2024

Xiangcen Wu, Yipei Wang, Qianye Yang, Natasha Thorley, Shonit Punwani, Veeru Kasivisvanathan, Ester Bonmati, Yipeng Hu

Figure 1 for AI-assisted prostate cancer detection and localisation on biparametric MR by classifying radiologist-positives

Figure 2 for AI-assisted prostate cancer detection and localisation on biparametric MR by classifying radiologist-positives

Figure 3 for AI-assisted prostate cancer detection and localisation on biparametric MR by classifying radiologist-positives

Figure 4 for AI-assisted prostate cancer detection and localisation on biparametric MR by classifying radiologist-positives

Abstract:Prostate cancer diagnosis through MR imaging have currently relied on radiologists' interpretation, whilst modern AI-based methods have been developed to detect clinically significant cancers independent of radiologists. In this study, we propose to develop deep learning models that improve the overall cancer diagnostic accuracy, by classifying radiologist-identified patients or lesions (i.e. radiologist-positives), as opposed to the existing models that are trained to discriminate over all patients. We develop a single voxel-level classification model, with a simple percentage threshold to determine positive cases, at levels of lesions, Barzell-zones and patients. Based on the presented experiments from two clinical data sets, consisting of histopathology-labelled MR images from more than 800 and 500 patients in the respective UCLA and UCL PROMIS studies, we show that the proposed strategy can improve the diagnostic accuracy, by augmenting the radiologist reading of the MR imaging. Among varying definition of clinical significance, the proposed strategy, for example, achieved a specificity of 44.1% (with AI assistance) from 36.3% (by radiologists alone), at a controlled sensitivity of 80.0% on the publicly available UCLA data set. This provides measurable clinical values in a range of applications such as reducing unnecessary biopsies, lowering cost in cancer screening and quantifying risk in therapies.

Via

Access Paper or Ask Questions

Students Rather Than Experts: A New AI For Education Pipeline To Model More Human-Like And Personalised Early Adolescences

Oct 21, 2024

Yiping Ma, Shiyu Hu, Xuchen Li, Yipei Wang, Shiqing Liu, Kang Hao Cheong

Abstract:The capabilities of large language models (LLMs) have been applied in expert systems across various domains, providing new opportunities for AI in Education. Educational interactions involve a cyclical exchange between teachers and students. Current research predominantly focuses on using LLMs to simulate teachers, leveraging their expertise to enhance student learning outcomes. However, the simulation of students, which could improve teachers' instructional skills, has received insufficient attention due to the challenges of modeling and evaluating virtual students. This research asks: Can LLMs be utilized to develop virtual student agents that mimic human-like behavior and individual variability? Unlike expert systems focusing on knowledge delivery, virtual students must replicate learning difficulties, emotional responses, and linguistic uncertainties. These traits present significant challenges in both modeling and evaluation. To address these issues, this study focuses on language learning as a context for modeling virtual student agents. We propose a novel AI4Education framework, called SOE (Scene-Object-Evaluation), to systematically construct LVSA (LLM-based Virtual Student Agents). By curating a dataset of personalized teacher-student interactions with various personality traits, question types, and learning stages, and fine-tuning LLMs using LoRA, we conduct multi-dimensional evaluation experiments. Specifically, we: (1) develop a theoretical framework for generating LVSA; (2) integrate human subjective evaluation metrics into GPT-4 assessments, demonstrating a strong correlation between human evaluators and GPT-4 in judging LVSA authenticity; and (3) validate that LLMs can generate human-like, personalized virtual student agents in educational contexts, laying a foundation for future applications in pre-service teacher training and multi-agent simulation environments.

Via

Access Paper or Ask Questions

Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison

Oct 20, 2024

Shiyu Hu, Xuchen Li, Xuzhao Li, Jing Zhang, Yipei Wang, Xin Zhao, Kang Hao Cheong

Figure 1 for Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison

Figure 2 for Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison

Figure 3 for Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison

Figure 4 for Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison

Abstract:Large vision-language models (LVLMs) have made significant strides in addressing complex video tasks, sparking researchers' interest in their human-like multimodal understanding capabilities. Video description serves as a fundamental task for evaluating video comprehension, necessitating a deep understanding of spatial and temporal dynamics, which presents challenges for both humans and machines. Thus, investigating whether LVLMs can describe videos as comprehensively as humans (through reasonable human-machine comparisons using video captioning as a proxy task) will enhance our understanding and application of these models. However, current benchmarks for video comprehension have notable limitations, including short video durations, brief annotations, and reliance on a single annotator's perspective. These factors hinder a comprehensive assessment of LVLMs' ability to understand complex, lengthy videos and prevent the establishment of a robust human baseline that accurately reflects human video comprehension capabilities. To address these issues, we propose a novel benchmark, FIOVA (Five In One Video Annotations), designed to evaluate the differences between LVLMs and human understanding more comprehensively. FIOVA includes 3,002 long video sequences (averaging 33.6 seconds) that cover diverse scenarios with complex spatiotemporal relationships. Each video is annotated by five distinct annotators, capturing a wide range of perspectives and resulting in captions that are 4-15 times longer than existing benchmarks, thereby establishing a robust baseline that represents human understanding comprehensively for the first time in video description tasks. Using the FIOVA benchmark, we conducted an in-depth evaluation of six state-of-the-art LVLMs, comparing their performance with humans. More detailed information can be found at https://huuuuusy.github.io/fiova/.

Via

Access Paper or Ask Questions

Poisson Ordinal Network for Gleason Group Estimation Using Bi-Parametric MRI

Jul 08, 2024

Yinsong Xu, Yipei Wang, Ziyi Shen, Iani J. M. B. Gayo, Natasha Thorley, Shonit Punwani, Aidong Men, Dean Barratt, Qingchao Chen, Yipeng Hu

Figure 1 for Poisson Ordinal Network for Gleason Group Estimation Using Bi-Parametric MRI

Figure 2 for Poisson Ordinal Network for Gleason Group Estimation Using Bi-Parametric MRI

Figure 3 for Poisson Ordinal Network for Gleason Group Estimation Using Bi-Parametric MRI

Figure 4 for Poisson Ordinal Network for Gleason Group Estimation Using Bi-Parametric MRI

Abstract:The Gleason groups serve as the primary histological grading system for prostate cancer, providing crucial insights into the cancer's potential for growth and metastasis. In clinical practice, pathologists determine the Gleason groups based on specimens obtained from ultrasound-guided biopsies. In this study, we investigate the feasibility of directly estimating the Gleason groups from MRI scans to reduce otherwise required biopsies. We identify two characteristics of this task, ordinality and the resulting dependent yet unknown variances between Gleason groups. In addition to the inter- / intra- observer variability in a multi-step Gleason scoring process based on the interpretation of Gleason patterns, our MR-based prediction is also subject to specimen sampling variance and, to a lesser degree, varying MR imaging protocols. To address this challenge, we propose a novel Poisson ordinal network (PON). PONs model the prediction using a Poisson distribution and leverages Poisson encoding and Poisson focal loss to capture a learnable dependency between ordinal classes (here, Gleason groups), rather than relying solely on the numerical ground-truth (e.g. Gleason Groups 1-5 or Gleason Scores 6-10). To improve this modelling efficacy, PONs also employ contrastive learning with a memory bank to regularise intra-class variance, decoupling the memory requirement of contrast learning from the batch size. Experimental results based on the images labelled by saturation biopsies from 265 prior-biopsy-blind patients, across two tasks demonstrate the superiority and effectiveness of our proposed method.

* MICCAI 2024

Via

Access Paper or Ask Questions

Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

May 07, 2024

Jian Jia, Yipei Wang, Yan Li, Honggang Chen, Xuehan Bai, Zhaocheng Liu, Jian Liang, Quan Chen, Han Li, Peng Jiang(+1 more)

Figure 1 for Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

Figure 2 for Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

Figure 3 for Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

Figure 4 for Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

Abstract:Contemporary recommender systems predominantly rely on collaborative filtering techniques, employing ID-embedding to capture latent associations among users and items. However, this approach overlooks the wealth of semantic information embedded within textual descriptions of items, leading to suboptimal performance in cold-start scenarios and long-tail user recommendations. Leveraging the capabilities of Large Language Models (LLMs) pretrained on massive text corpus presents a promising avenue for enhancing recommender systems by integrating open-world domain knowledge. In this paper, we propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge. We address computational complexity concerns by utilizing pretrained LLMs as item encoders and freezing LLM parameters to avoid catastrophic forgetting and preserve open-world knowledge. To bridge the gap between the open-world and collaborative domains, we design a twin-tower structure supervised by the recommendation task and tailored for practical industrial application. Through offline experiments on the large-scale industrial dataset and online experiments on A/B tests, we demonstrate the efficacy of our approach.

* 11 pages, 6 figures

Via

Access Paper or Ask Questions

Learning the irreversible progression trajectory of Alzheimer's disease

Mar 10, 2024

Yipei Wang, Bing He, Shannon Risacher, Andrew Saykin, Jingwen Yan, Xiaoqian Wang

Figure 1 for Learning the irreversible progression trajectory of Alzheimer's disease

Figure 2 for Learning the irreversible progression trajectory of Alzheimer's disease

Figure 3 for Learning the irreversible progression trajectory of Alzheimer's disease

Figure 4 for Learning the irreversible progression trajectory of Alzheimer's disease

Abstract:Alzheimer's disease (AD) is a progressive and irreversible brain disorder that unfolds over the course of 30 years. Therefore, it is critical to capture the disease progression in an early stage such that intervention can be applied before the onset of symptoms. Machine learning (ML) models have been shown effective in predicting the onset of AD. Yet for subjects with follow-up visits, existing techniques for AD classification only aim for accurate group assignment, where the monotonically increasing risk across follow-up visits is usually ignored. Resulted fluctuating risk scores across visits violate the irreversibility of AD, hampering the trustworthiness of models and also providing little value to understanding the disease progression. To address this issue, we propose a novel regularization approach to predict AD longitudinally. Our technique aims to maintain the expected monotonicity of increasing disease risk during progression while preserving expressiveness. Specifically, we introduce a monotonicity constraint that encourages the model to predict disease risk in a consistent and ordered manner across follow-up visits. We evaluate our method using the longitudinal structural MRI and amyloid-PET imaging data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Our model outperforms existing techniques in capturing the progressiveness of disease risk, and at the same time preserves prediction accuracy.

* accepted by ISBI 2024

Via

Access Paper or Ask Questions

Semi-weakly-supervised neural network training for medical image registration

Feb 16, 2024

Yiwen Li, Yunguan Fu, Iani J. M. B. Gayo, Qianye Yang, Zhe Min, Shaheer U. Saeed, Wen Yan, Yipei Wang, J. Alison Noble, Mark Emberton(+4 more)

Figure 1 for Semi-weakly-supervised neural network training for medical image registration

Figure 2 for Semi-weakly-supervised neural network training for medical image registration

Figure 3 for Semi-weakly-supervised neural network training for medical image registration

Figure 4 for Semi-weakly-supervised neural network training for medical image registration

Abstract:For training registration networks, weak supervision from segmented corresponding regions-of-interest (ROIs) have been proven effective for (a) supplementing unsupervised methods, and (b) being used independently in registration tasks in which unsupervised losses are unavailable or ineffective. This correspondence-informing supervision entails cost in annotation that requires significant specialised effort. This paper describes a semi-weakly-supervised registration pipeline that improves the model performance, when only a small corresponding-ROI-labelled dataset is available, by exploiting unlabelled image pairs. We examine two types of augmentation methods by perturbation on network weights and image resampling, such that consistency-based unsupervised losses can be applied on unlabelled data. The novel WarpDDF and RegCut approaches are proposed to allow commutative perturbation between an image pair and the predicted spatial transformation (i.e. respective input and output of registration networks), distinct from existing perturbation methods for classification or segmentation. Experiments using 589 male pelvic MR images, labelled with eight anatomical ROIs, show the improvement in registration performance and the ablated contributions from the individual strategies. Furthermore, this study attempts to construct one of the first computational atlases for pelvic structures, enabled by registering inter-subject MRs, and quantifies the significant differences due to the proposed semi-weak supervision with a discussion on the potential clinical use of example atlas-derived statistics.

Via

Access Paper or Ask Questions