Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tom Diethe

Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation

May 23, 2025

Zhihua Liu, Amrutha Saseendran, Lei Tong, Xilin He, Fariba Yousefi, Nikolay Burlutskiy, Dino Oglic, Tom Diethe, Philip Teare, Huiyu Zhou(+1 more)

Abstract:Open-set image segmentation poses a significant challenge because existing methods often demand extensive training or fine-tuning and generally struggle to segment unified objects consistently across diverse text reference expressions. Motivated by this, we propose Segment Anyword, a novel training-free visual concept prompt learning approach for open-set language grounded segmentation that relies on token-level cross-attention maps from a frozen diffusion model to produce segmentation surrogates or mask prompts, which are then refined into targeted object masks. Initial prompts typically lack coherence and consistency as the complexity of the image-text increases, resulting in suboptimal mask fragments. To tackle this issue, we further introduce a novel linguistic-guided visual prompt regularization that binds and clusters visual prompts based on sentence dependency and syntactic structural information, enabling the extraction of robust, noise-tolerant mask prompts, and significant improvements in segmentation accuracy. The proposed approach is effective, generalizes across different open-set segmentation tasks, and achieves state-of-the-art results of 52.5 (+6.8 relative) mIoU on Pascal Context 59, 67.73 (+25.73 relative) cIoU on gRefCOCO, and 67.4 (+1.1 relative to fine-tuned methods) mIoU on GranDf, which is the most complex open-set grounded segmentation task in the field.

Via

Access Paper or Ask Questions

Big Batch Bayesian Active Learning by Considering Predictive Probabilities

Jan 14, 2025

Sebastian W. Ober, Samuel Power, Tom Diethe, Henry B. Moss

Figure 1 for Big Batch Bayesian Active Learning by Considering Predictive Probabilities

Figure 2 for Big Batch Bayesian Active Learning by Considering Predictive Probabilities

Abstract:We observe that BatchBALD, a popular acquisition function for batch Bayesian active learning for classification, can conflate epistemic and aleatoric uncertainty, leading to suboptimal performance. Motivated by this observation, we propose to focus on the predictive probabilities, which only exhibit epistemic uncertainty. The result is an acquisition function that not only performs better, but is also faster to evaluate, allowing for larger batches than before.

* 7 pages, 2 figures; presented as a lightning talk at the NeurIPS Workshop on Bayesian Decision-making and Uncertainty (BDU; 2024)

Via

Access Paper or Ask Questions

DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

Oct 24, 2024

Aryo Pradipta Gema, Chen Jin, Ahmed Abdulaal, Tom Diethe, Philip Teare, Beatrice Alex, Pasquale Minervini, Amrutha Saseendran

Figure 1 for DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

Figure 2 for DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

Figure 3 for DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

Figure 4 for DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

Abstract:Large Language Models (LLMs) often hallucinate, producing unfaithful or factually incorrect outputs by misrepresenting the provided context or incorrectly recalling internal knowledge. Recent studies have identified specific attention heads within the Transformer architecture, known as retrieval heads, responsible for extracting relevant contextual information. We hypothesise that masking these retrieval heads can induce hallucinations and that contrasting the outputs of the base LLM and the masked LLM can reduce hallucinations. To this end, we propose Decoding by Contrasting Retrieval Heads (DeCoRe), a novel training-free decoding strategy that amplifies information found in the context and model parameters. DeCoRe mitigates potentially hallucinated responses by dynamically contrasting the outputs of the base LLM and the masked LLM, using conditional entropy as a guide. Our extensive experiments confirm that DeCoRe significantly improves performance on tasks requiring high contextual faithfulness, such as summarisation (XSum by 18.6%), instruction following (MemoTrap by 10.9%), and open-book question answering (NQ-Open by 2.4% and NQ-Swap by 5.5%).

Via

Access Paper or Ask Questions

Tackling Structural Hallucination in Image Translation with Local Diffusion

Apr 13, 2024

Seunghoi Kim, Chen Jin, Tom Diethe, Matteo Figini, Henry F. J. Tregidgo, Asher Mullokandov, Philip Teare, Daniel C. Alexander

Figure 1 for Tackling Structural Hallucination in Image Translation with Local Diffusion

Figure 2 for Tackling Structural Hallucination in Image Translation with Local Diffusion

Figure 3 for Tackling Structural Hallucination in Image Translation with Local Diffusion

Figure 4 for Tackling Structural Hallucination in Image Translation with Local Diffusion

Abstract:Recent developments in diffusion models have advanced conditioned image generation, yet they struggle with reconstructing out-of-distribution (OOD) images, such as unseen tumors in medical images, causing ``image hallucination'' and risking misdiagnosis. We hypothesize such hallucinations result from local OOD regions in the conditional images. We verify that partitioning the OOD region and conducting separate image generations alleviates hallucinations in several applications. From this, we propose a training-free diffusion framework that reduces hallucination with multiple Local Diffusion processes. Our approach involves OOD estimation followed by two modules: a ``branching'' module generates locally both within and outside OOD regions, and a ``fusion'' module integrates these predictions into one. Our evaluation shows our method mitigates hallucination over baseline models quantitatively and qualitatively, reducing misdiagnosis by 40% and 25% in the real-world medical and natural image datasets, respectively. It also demonstrates compatibility with various pre-trained diffusion models.

Via

Access Paper or Ask Questions

Improving Antibody Humanness Prediction using Patent Data

Jan 31, 2024

Talip Ucar, Aubin Ramon, Dino Oglic, Rebecca Croasdale-Wood, Tom Diethe, Pietro Sormanni

Abstract:We investigate the potential of patent data for improving the antibody humanness prediction using a multi-stage, multi-loss training process. Humanness serves as a proxy for the immunogenic response to antibody therapeutics, one of the major causes of attrition in drug discovery and a challenging obstacle for their use in clinical settings. We pose the initial learning stage as a weakly-supervised contrastive-learning problem, where each antibody sequence is associated with possibly multiple identifiers of function and the objective is to learn an encoder that groups them according to their patented properties. We then freeze a part of the contrastive encoder and continue training it on the patent data using the cross-entropy loss to predict the humanness score of a given antibody sequence. We illustrate the utility of the patent data and our approach by performing inference on three different immunogenicity datasets, unseen during training. Our empirical results demonstrate that the learned model consistently outperforms the alternative baselines and establishes new state-of-the-art on five out of six inference tasks, irrespective of the used metric.

* 13 pages, 6 figures, Code: https://github.com/AstraZeneca/SelfPAD

Via

Access Paper or Ask Questions

An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Concept Prompt Learning

Oct 18, 2023

Chen Jin, Ryutaro Tanno, Amrutha Saseendran, Tom Diethe, Philip Teare

Abstract:Textural Inversion, a prompt learning method, learns a singular embedding for a new "word" to represent image style and appearance, allowing it to be integrated into natural language sentences to generate novel synthesised images. However, identifying and integrating multiple object-level concepts within one scene poses significant challenges even when embeddings for individual concepts are attainable. This is further confirmed by our empirical tests. To address this challenge, we introduce a framework for Multi-Concept Prompt Learning (MCPL), where multiple new "words" are simultaneously learned from a single sentence-image pair. To enhance the accuracy of word-concept correlation, we propose three regularisation techniques: Attention Masking (AttnMask) to concentrate learning on relevant areas; Prompts Contrastive Loss (PromptCL) to separate the embeddings of different concepts; and Bind adjective (Bind adj.) to associate new "words" with known words. We evaluate via image generation, editing, and attention visualisation with diverse images. Extensive quantitative comparisons demonstrate that our method can learn more semantically disentangled concepts with enhanced word-concept correlation. Additionally, we introduce a novel dataset and evaluation protocol tailored for this new task of learning object-level concepts.

* Project page: https://github.com/lxasqjc/MCPL

Via

Access Paper or Ask Questions

Unlocking the Heart Using Adaptive Locked Agnostic Networks

Sep 21, 2023

Sylwia Majchrowska, Anders Hildeman, Philip Teare, Tom Diethe

Figure 1 for Unlocking the Heart Using Adaptive Locked Agnostic Networks

Figure 2 for Unlocking the Heart Using Adaptive Locked Agnostic Networks

Figure 3 for Unlocking the Heart Using Adaptive Locked Agnostic Networks

Figure 4 for Unlocking the Heart Using Adaptive Locked Agnostic Networks

Abstract:Supervised training of deep learning models for medical imaging applications requires a significant amount of labeled data. This is posing a challenge as the images are required to be annotated by medical professionals. To address this limitation, we introduce the Adaptive Locked Agnostic Network (ALAN), a concept involving self-supervised visual feature extraction using a large backbone model to produce anatomically robust semantic self-segmentation. In the ALAN methodology, this self-supervised training occurs only once on a large and diverse dataset. Due to the intuitive interpretability of the segmentation, downstream models tailored for specific tasks can be easily designed using white-box models with few parameters. This, in turn, opens up the possibility of communicating the inner workings of a model with domain experts and introducing prior knowledge into it. It also means that the downstream models become less data-hungry compared to fully supervised approaches. These characteristics make ALAN particularly well-suited for resource-scarce scenarios, such as costly clinical trials and rare diseases. In this paper, we apply the ALAN approach to three publicly available echocardiography datasets: EchoNet-Dynamic, CAMUS, and TMED-2. Our findings demonstrate that the self-supervised backbone model robustly identifies anatomical subregions of the heart in an apical four-chamber view. Building upon this, we design two downstream models, one for segmenting a target anatomical region, and a second for echocardiogram view classification.

* The article was accepted to ICCV 2023 workshop PerDream: PERception, Decision making and REAsoning through Multimodal foundational modeling

Via

Access Paper or Ask Questions

Continual Density Ratio Estimation in an Online Setting

Mar 09, 2021

Yu Chen, Song Liu, Tom Diethe, Peter Flach

Figure 1 for Continual Density Ratio Estimation in an Online Setting

Figure 2 for Continual Density Ratio Estimation in an Online Setting

Figure 3 for Continual Density Ratio Estimation in an Online Setting

Figure 4 for Continual Density Ratio Estimation in an Online Setting

Abstract:In online applications with streaming data, awareness of how far the training or test set has shifted away from the original dataset can be crucial to the performance of the model. However, we may not have access to historical samples in the data stream. To cope with such situations, we propose a novel method, Continual Density Ratio Estimation (CDRE), for estimating density ratios between the initial and current distributions ($p/q_t$) of a data stream in an iterative fashion without the need of storing past samples, where $q_t$ is shifting away from $p$ over time $t$. We demonstrate that CDRE can be more accurate than standard DRE in terms of estimating divergences between distributions, despite not requiring samples from the original distribution. CDRE can be applied in scenarios of online learning, such as importance weighted covariate shift, tracing dataset changes for better decision making. In addition, (CDRE) enables the evaluation of generative models under the setting of continual learning. To the best of our knowledge, there is no existing method that can evaluate generative models in continual learning without storing samples from the original distribution.

Via

Access Paper or Ask Questions

Interpretable Anomaly Detection with Mondrian P{ó}lya Forests on Data Streams

Aug 04, 2020

Charlie Dickens, Eric Meissner, Pablo G. Moreno, Tom Diethe

Figure 1 for Interpretable Anomaly Detection with Mondrian P{ó}lya Forests on Data Streams

Figure 2 for Interpretable Anomaly Detection with Mondrian P{ó}lya Forests on Data Streams

Figure 3 for Interpretable Anomaly Detection with Mondrian P{ó}lya Forests on Data Streams

Figure 4 for Interpretable Anomaly Detection with Mondrian P{ó}lya Forests on Data Streams

Abstract:Anomaly detection at scale is an extremely challenging problem of great practicality. When data is large and high-dimensional, it can be difficult to detect which observations do not fit the expected behaviour. Recent work has coalesced on variations of (random) $k$\emph{d-trees} to summarise data for anomaly detection. However, these methods rely on ad-hoc score functions that are not easy to interpret, making it difficult to asses the severity of the detected anomalies or select a reasonable threshold in the absence of labelled anomalies. To solve these issues, we contextualise these methods in a probabilistic framework which we call the Mondrian \Polya{} Forest for estimating the underlying probability density function generating the data and enabling greater interpretability than prior work. In addition, we develop a memory efficient variant able to operate in the modern streaming environments. Our experiments show that these methods achieves state-of-the-art performance while providing statistically interpretable anomaly scores.

Via

Access Paper or Ask Questions

Bypassing Gradients Re-Projection with Episodic Memories in Online Continual Learning

Jun 19, 2020

Yu Chen, Tom Diethe, Peter Flach

Figure 1 for Bypassing Gradients Re-Projection with Episodic Memories in Online Continual Learning

Figure 2 for Bypassing Gradients Re-Projection with Episodic Memories in Online Continual Learning

Figure 3 for Bypassing Gradients Re-Projection with Episodic Memories in Online Continual Learning

Figure 4 for Bypassing Gradients Re-Projection with Episodic Memories in Online Continual Learning

Abstract:The use of episodic memories in continual learning is an efficient way to prevent the phenomenon of catastrophic forgetting. In recent studies, several gradient-based approaches have been developed to make more efficient use of compact episodic memories, which constrain the gradients resulting from new samples with gradients from memorized samples. In this paper, we propose a method for decreasing the diversity of gradients through an auxiliary optimization objective that we call Discriminative Representation Loss, instead of directly re-projecting the gradients. Our methods show promising performance with relatively cheap computational cost on several benchmark experiments.

Via

Access Paper or Ask Questions