Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuyuan Liu

Leveraging Labelled Data Knowledge: A Cooperative Rectification Learning Network for Semi-supervised 3D Medical Image Segmentation

Feb 17, 2025

Yanyan Wang, Kechen Song, Yuyuan Liu, Shuai Ma, Yunhui Yan, Gustavo Carneiro

Abstract:Semi-supervised 3D medical image segmentation aims to achieve accurate segmentation using few labelled data and numerous unlabelled data. The main challenge in the design of semi-supervised learning methods consists in the effective use of the unlabelled data for training. A promising solution consists of ensuring consistent predictions across different views of the data, where the efficacy of this strategy depends on the accuracy of the pseudo-labels generated by the model for this consistency learning strategy. In this paper, we introduce a new methodology to produce high-quality pseudo-labels for a consistency learning strategy to address semi-supervised 3D medical image segmentation. The methodology has three important contributions. The first contribution is the Cooperative Rectification Learning Network (CRLN) that learns multiple prototypes per class to be used as external knowledge priors to adaptively rectify pseudo-labels at the voxel level. The second contribution consists of the Dynamic Interaction Module (DIM) to facilitate pairwise and cross-class interactions between prototypes and multi-resolution image features, enabling the production of accurate voxel-level clues for pseudo-label rectification. The third contribution is the Cooperative Positive Supervision (CPS), which optimises uncertain representations to align with unassertive representations of their class distributions, improving the model's accuracy in classifying uncertain regions. Extensive experiments on three public 3D medical segmentation datasets demonstrate the effectiveness and superiority of our semi-supervised learning method.

* Medical Image Analysis

Via

Access Paper or Ask Questions

Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research

Feb 07, 2025

Junde Wu, Jiayuan Zhu, Yuyuan Liu

Figure 1 for Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research

Figure 2 for Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research

Figure 3 for Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research

Figure 4 for Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research

Abstract:We introduce Agentic Reasoning, a framework that enhances large language model (LLM) reasoning by integrating external tool-using agents. Unlike conventional LLM-based reasoning approaches, which rely solely on internal inference, Agentic Reasoning dynamically engages web search, code execution, and structured reasoning-context memory to solve complex problems requiring deep research and multi-step logical deduction. Our framework introduces the Mind Map agent, which constructs a structured knowledge graph to track logical relationships, improving deductive reasoning. Additionally, the integration of web-search and coding agents enables real-time retrieval and computational analysis, enhancing reasoning accuracy and decision-making. Evaluations on PhD-level scientific reasoning (GPQA) and domain-specific deep research tasks demonstrate that our approach significantly outperforms existing models, including leading retrieval-augmented generation (RAG) systems and closed-source LLMs. Moreover, our results indicate that agentic reasoning improves expert-level knowledge synthesis, test-time scalability, and structured problem-solving. The code is at: https://github.com/theworldofagents/Agentic-Reasoning.

* work in progress

Via

Access Paper or Ask Questions

ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation

Jul 09, 2024

Yuyuan Liu, Yuanhong Chen, Hu Wang, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro

Figure 1 for ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation

Figure 2 for ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation

Figure 3 for ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation

Figure 4 for ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation

Abstract:The costly and time-consuming annotation process to produce large training sets for modelling semantic LiDAR segmentation methods has motivated the development of semi-supervised learning (SSL) methods. However, such SSL approaches often concentrate on employing consistency learning only for individual LiDAR representations. This narrow focus results in limited perturbations that generally fail to enable effective consistency learning. Additionally, these SSL approaches employ contrastive learning based on the sampling from a limited set of positive and negative embedding samples. This paper introduces a novel semi-supervised LiDAR semantic segmentation framework called ItTakesTwo (IT2). IT2 is designed to ensure consistent predictions from peer LiDAR representations, thereby improving the perturbation effectiveness in consistency learning. Furthermore, our contrastive learning employs informative samples drawn from a distribution of positive and negative embeddings learned from the entire training set. Results on public benchmarks show that our approach achieves remarkable improvements over the previous state-of-the-art (SOTA) methods in the field. The code is available at: https://github.com/yyliu01/IT2.

* 27 pages (15 pages main paper and 12 pages supplementary with references), ECCV 2024 accepted

Via

Access Paper or Ask Questions

CPM: Class-conditional Prompting Machine for Audio-visual Segmentation

Jul 07, 2024

Yuanhong Chen, Chong Wang, Yuyuan Liu, Hu Wang, Gustavo Carneiro

Figure 1 for CPM: Class-conditional Prompting Machine for Audio-visual Segmentation

Figure 2 for CPM: Class-conditional Prompting Machine for Audio-visual Segmentation

Figure 3 for CPM: Class-conditional Prompting Machine for Audio-visual Segmentation

Figure 4 for CPM: Class-conditional Prompting Machine for Audio-visual Segmentation

Abstract:Audio-visual segmentation (AVS) is an emerging task that aims to accurately segment sounding objects based on audio-visual cues. The success of AVS learning systems depends on the effectiveness of cross-modal interaction. Such a requirement can be naturally fulfilled by leveraging transformer-based segmentation architecture due to its inherent ability to capture long-range dependencies and flexibility in handling different modalities. However, the inherent training issues of transformer-based methods, such as the low efficacy of cross-attention and unstable bipartite matching, can be amplified in AVS, particularly when the learned audio query does not provide a clear semantic clue. In this paper, we address these two issues with the new Class-conditional Prompting Machine (CPM). CPM improves the bipartite matching with a learning strategy combining class-agnostic queries with class-conditional queries. The efficacy of cross-modal attention is upgraded with new learning objectives for the audio, visual and joint modalities. We conduct experiments on AVS benchmarks, demonstrating that our method achieves state-of-the-art (SOTA) segmentation accuracy.

Via

Access Paper or Ask Questions

Enhancing Multi-modal Learning: Meta-learned Cross-modal Knowledge Distillation for Handling Missing Modalities

May 12, 2024

Hu Wang, Congbo Ma, Yuyuan Liu, Yuanhong Chen, Yu Tian, Jodie Avery, Louise Hull, Gustavo Carneiro

Figure 1 for Enhancing Multi-modal Learning: Meta-learned Cross-modal Knowledge Distillation for Handling Missing Modalities

Figure 2 for Enhancing Multi-modal Learning: Meta-learned Cross-modal Knowledge Distillation for Handling Missing Modalities

Figure 3 for Enhancing Multi-modal Learning: Meta-learned Cross-modal Knowledge Distillation for Handling Missing Modalities

Figure 4 for Enhancing Multi-modal Learning: Meta-learned Cross-modal Knowledge Distillation for Handling Missing Modalities

Abstract:In multi-modal learning, some modalities are more influential than others, and their absence can have a significant impact on classification/segmentation accuracy. Hence, an important research question is if it is possible for trained multi-modal models to have high accuracy even when influential modalities are absent from the input data. In this paper, we propose a novel approach called Meta-learned Cross-modal Knowledge Distillation (MCKD) to address this research question. MCKD adaptively estimates the importance weight of each modality through a meta-learning process. These dynamically learned modality importance weights are used in a pairwise cross-modal knowledge distillation process to transfer the knowledge from the modalities with higher importance weight to the modalities with lower importance weight. This cross-modal knowledge distillation produces a highly accurate model even with the absence of influential modalities. Differently from previous methods in the field, our approach is designed to work in multiple tasks (e.g., segmentation and classification) with minimal adaptation. Experimental results on the Brain tumor Segmentation Dataset 2018 (BraTS2018) and the Audiovision-MNIST classification dataset demonstrate the superiority of MCKD over current state-of-the-art models. Particularly in BraTS2018, we achieve substantial improvements of 3.51\% for enhancing tumor, 2.19\% for tumor core, and 1.14\% for the whole tumor in terms of average segmentation Dice score.

Via

Access Paper or Ask Questions

Generative Noisy-Label Learning by Implicit Dicriminative Approximation with Partial Label Prior

Aug 02, 2023

Fengbei Liu, Yuanhong Chen, Chong Wang, Yuyuan Liu, Gustavo Carneiro

Abstract:The learning with noisy labels has been addressed with both discriminative and generative models. Although discriminative models have dominated the field due to their simpler modeling and more efficient computational training processes, generative models offer a more effective means of disentangling clean and noisy labels and improving the estimation of the label transition matrix. However, generative approaches maximize the joint likelihood of noisy labels and data using a complex formulation that only indirectly optimizes the model of interest associating data and clean labels. Additionally, these approaches rely on generative models that are challenging to train and tend to use uninformative clean label priors. In this paper, we propose a new generative noisy-label learning approach that addresses these three issues. First, we propose a new model optimisation that directly associates data and clean labels. Second, the generative model is implicitly estimated using a discriminative model, eliminating the inefficient training of a generative model. Third, we propose a new informative label prior inspired by partial label learning as supervision signal for noisy label learning. Extensive experiments on several noisy-label benchmarks demonstrate that our generative model provides state-of-the-art results while maintaining a similar computational complexity as discriminative models.

Via

Access Paper or Ask Questions

A Closer Look at Audio-Visual Semantic Segmentation

Apr 11, 2023

Yuanhong Chen, Yuyuan Liu, Hu Wang, Fengbei Liu, Chong Wang, Gustavo Carneiro

Abstract:Audio-visual segmentation (AVS) is a complex task that involves accurately segmenting the corresponding sounding object based on audio-visual queries. Successful audio-visual learning requires two essential components: 1) an unbiased dataset with high-quality pixel-level multi-class labels, and 2) a model capable of effectively linking audio information with its corresponding visual object. However, these two requirements are only partially addressed by current methods, with training sets containing biased audio-visual data, and models that generalise poorly beyond this biased training set. In this work, we propose a new strategy to build cost-effective and relatively unbiased audio-visual semantic segmentation benchmarks. Our strategy, called Visual Post-production (VPO), explores the observation that it is not necessary to have explicit audio-visual pairs extracted from single video sources to build such benchmarks. We also refine the previously proposed AVSBench to transform it into the audio-visual semantic segmentation benchmark AVSBench-Single+. Furthermore, this paper introduces a new pixel-wise audio-visual contrastive learning method to enable a better generalisation of the model beyond the training set. We verify the validity of the VPO strategy by showing that state-of-the-art (SOTA) models trained with datasets built by matching audio and visual data from different sources or with datasets containing audio and visual data from the same video source produce almost the same accuracy. Then, using the proposed VPO benchmarks and AVSBench-Single+, we show that our method produces more accurate audio-visual semantic segmentation than SOTA models. Code and dataset will be available.

Via

Access Paper or Ask Questions

BRAIxDet: Learning to Detect Malignant Breast Lesion with Incomplete Annotations

Feb 02, 2023

Yuanhong Chen, Yuyuan Liu, Chong Wang, Michael Elliott, Chun Fung Kwok, Carlos Pena-Solorzano, Yu Tian, Fengbei Liu, Helen Frazer, Davis J. McCarthy(+1 more)

Figure 1 for BRAIxDet: Learning to Detect Malignant Breast Lesion with Incomplete Annotations

Figure 2 for BRAIxDet: Learning to Detect Malignant Breast Lesion with Incomplete Annotations

Figure 3 for BRAIxDet: Learning to Detect Malignant Breast Lesion with Incomplete Annotations

Figure 4 for BRAIxDet: Learning to Detect Malignant Breast Lesion with Incomplete Annotations

Abstract:Methods to detect malignant lesions from screening mammograms are usually trained with fully annotated datasets, where images are labelled with the localisation and classification of cancerous lesions. However, real-world screening mammogram datasets commonly have a subset that is fully annotated and another subset that is weakly annotated with just the global classification (i.e., without lesion localisation). Given the large size of such datasets, researchers usually face a dilemma with the weakly annotated subset: to not use it or to fully annotate it. The first option will reduce detection accuracy because it does not use the whole dataset, and the second option is too expensive given that the annotation needs to be done by expert radiologists. In this paper, we propose a middle-ground solution for the dilemma, which is to formulate the training as a weakly- and semi-supervised learning problem that we refer to as malignant breast lesion detection with incomplete annotations. To address this problem, our new method comprises two stages, namely: 1) pre-training a multi-view mammogram classifier with weak supervision from the whole dataset, and 2) extending the trained classifier to become a multi-view detector that is trained with semi-supervised student-teacher learning, where the training set contains fully and weakly-annotated mammograms. We provide extensive detection results on two real-world screening mammogram datasets containing incomplete annotations, and show that our proposed approach achieves state-of-the-art results in the detection of malignant breast lesions with incomplete annotations.

* Under Review

Via

Access Paper or Ask Questions

Learning Support and Trivial Prototypes for Interpretable Image Classification

Jan 08, 2023

Chong Wang, Yuyuan Liu, Yuanhong Chen, Fengbei Liu, Yu Tian, Davis J. McCarthy, Helen Frazer, Gustavo Carneiro

Abstract:Prototypical part network (ProtoPNet) methods have been designed to achieve interpretable classification by associating predictions with a set of training prototypes, which we refer to as trivial (i.e., easy-to-learn) prototypes because they are trained to lie far from the classification boundary in the feature space. Note that it is possible to make an analogy between ProtoPNet and support vector machine (SVM) given that the classification from both methods relies on computing similarity with a set of training points (i.e., trivial prototypes in ProtoPNet, and support vectors in SVM). However, while trivial prototypes are located far from the classification boundary, support vectors are located close to this boundary, and we argue that this discrepancy with the well-established SVM theory can result in ProtoPNet models with suboptimal classification accuracy. In this paper, we aim to improve the classification accuracy of ProtoPNet with a new method to learn support prototypes that lie near the classification boundary in the feature space, as suggested by the SVM theory. In addition, we target the improvement of classification interpretability with a new model, named ST-ProtoPNet, which exploits our support prototypes and the trivial prototypes to provide complementary interpretability information. Experimental results on CUB-200-2011, Stanford Cars, and Stanford Dogs datasets demonstrate that the proposed method achieves state-of-the-art classification accuracy and produces more visually meaningful and diverse prototypes.

Via

Access Paper or Ask Questions

Residual Pattern Learning for Pixel-wise Out-of-Distribution Detection in Semantic Segmentation

Nov 26, 2022

Yuyuan Liu, Choubo Ding, Yu Tian, Guansong Pang, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro

Abstract:Semantic segmentation models classify pixels into a set of known (``in-distribution'') visual classes. When deployed in an open world, the reliability of these models depends on their ability not only to classify in-distribution pixels but also to detect out-of-distribution (OoD) pixels. Historically, the poor OoD detection performance of these models has motivated the design of methods based on model re-training using synthetic training images that include OoD visual objects. Although successful, these re-trained methods have two issues: 1) their in-distribution segmentation accuracy may drop during re-training, and 2) their OoD detection accuracy does not generalise well to new contexts (e.g., country surroundings) outside the training set (e.g., city surroundings). In this paper, we mitigate these issues with: (i) a new residual pattern learning (RPL) module that assists the segmentation model to detect OoD pixels without affecting the inlier segmentation performance; and (ii) a novel context-robust contrastive learning (CoroCL) that enforces RPL to robustly detect OoD pixels among various contexts. Our approach improves by around 10\% FPR and 7\% AuPRC the previous state-of-the-art in Fishyscapes, Segment-Me-If-You-Can, and RoadAnomaly datasets. Our code is available at: https://github.com/yyliu01/RPL.

* 16 pages, 11 figures and it is a preprint version

Via

Access Paper or Ask Questions