Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gabriele Campanella

Revisiting Automatic Data Curation for Vision Foundation Models in Digital Pathology

Mar 24, 2025

Boqi Chen, Cédric Vincent-Cuaz, Lydia A. Schoenpflug, Manuel Madeira, Lisa Fournier, Vaishnavi Subramanian, Sonali Andani, Samuel Ruiperez-Campillo, Julia E. Vogt, Raphaëlle Luisier(+5 more)

Abstract:Vision foundation models (FMs) are accelerating the development of digital pathology algorithms and transforming biomedical research. These models learn, in a self-supervised manner, to represent histological features in highly heterogeneous tiles extracted from whole-slide images (WSIs) of real-world patient samples. The performance of these FMs is significantly influenced by the size, diversity, and balance of the pre-training data. However, data selection has been primarily guided by expert knowledge at the WSI level, focusing on factors such as disease classification and tissue types, while largely overlooking the granular details available at the tile level. In this paper, we investigate the potential of unsupervised automatic data curation at the tile-level, taking into account 350 million tiles. Specifically, we apply hierarchical clustering trees to pre-extracted tile embeddings, allowing us to sample balanced datasets uniformly across the embedding space of the pretrained FM. We further identify these datasets are subject to a trade-off between size and balance, potentially compromising the quality of representations learned by FMs, and propose tailored batch sampling strategies to mitigate this effect. We demonstrate the effectiveness of our method through improved performance on a diverse range of clinically relevant downstream tasks.

* MICCAI 2025

Via

Access Paper or Ask Questions

A Clinical Benchmark of Public Self-Supervised Pathology Foundation Models

Jul 11, 2024

Gabriele Campanella, Shengjia Chen, Ruchika Verma, Jennifer Zeng, Aryeh Stock, Matt Croken, Brandon Veremis, Abdulkadir Elmas, Kuan-lin Huang, Ricky Kwan(+3 more)

Abstract:The use of self-supervised learning (SSL) to train pathology foundation models has increased substantially in the past few years. Notably, several models trained on large quantities of clinical data have been made publicly available in recent months. This will significantly enhance scientific research in computational pathology and help bridge the gap between research and clinical deployment. With the increase in availability of public foundation models of different sizes, trained using different algorithms on different datasets, it becomes important to establish a benchmark to compare the performance of such models on a variety of clinically relevant tasks spanning multiple organs and diseases. In this work, we present a collection of pathology datasets comprising clinical slides associated with clinically relevant endpoints including cancer diagnoses and a variety of biomarkers generated during standard hospital operation from two medical centers. We leverage these datasets to systematically assess the performance of public pathology foundation models and provide insights into best practices for training new foundation models and selecting appropriate pretrained models.

* arXiv admin note: text overlap with arXiv:2310.07033

Via

Access Paper or Ask Questions

Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective

Jul 10, 2024

Shengjia Chen, Gabriele Campanella, Abdulkadir Elmas, Aryeh Stock, Jennifer Zeng, Alexandros D. Polydorides, Adam J. Schoenfeld, Kuan-lin Huang, Jane Houldsworth, Chad Vanderbilt(+1 more)

Abstract:Recent advances in artificial intelligence (AI), in particular self-supervised learning of foundation models (FMs), are revolutionizing medical imaging and computational pathology (CPath). A constant challenge in the analysis of digital Whole Slide Images (WSIs) is the problem of aggregating tens of thousands of tile-level image embeddings to a slide-level representation. Due to the prevalent use of datasets created for genomic research, such as TCGA, for method development, the performance of these techniques on diagnostic slides from clinical practice has been inadequately explored. This study conducts a thorough benchmarking analysis of ten slide-level aggregation techniques across nine clinically relevant tasks, including diagnostic assessment, biomarker classification, and outcome prediction. The results yield following key insights: (1) Embeddings derived from domain-specific (histological images) FMs outperform those from generic ImageNet-based models across aggregation methods. (2) Spatial-aware aggregators enhance the performance significantly when using ImageNet pre-trained models but not when using FMs. (3) No single model excels in all tasks and spatially-aware models do not show general superiority as it would be expected. These findings underscore the need for more adaptable and universally applicable aggregation techniques, guiding future research towards tools that better meet the evolving needs of clinical-AI in pathology. The code used in this work is available at \url{https://github.com/fuchs-lab-public/CPath_SABenchmark}.

* 10 pages, 2 figures

Via

Access Paper or Ask Questions

Beyond Multiple Instance Learning: Full Resolution All-In-Memory End-To-End Pathology Slide Modeling

Mar 07, 2024

Gabriele Campanella, Eugene Fluder, Jennifer Zeng, Chad Vanderbilt, Thomas J. Fuchs

Abstract:Artificial Intelligence (AI) has great potential to improve health outcomes by training systems on vast digitized clinical datasets. Computational Pathology, with its massive amounts of microscopy image data and impact on diagnostics and biomarkers, is at the forefront of this development. Gigapixel pathology slides pose a unique challenge due to their enormous size and are usually divided into tens of thousands of smaller tiles for analysis. This results in a discontinuity in the machine learning process by separating the training of tile-level encoders from slide-level aggregators and the need to adopt weakly supervised learning strategies. Training models from entire pathology slides end-to-end has been largely unexplored due to its computational challenges. To overcome this problem, we propose a novel approach to jointly train both a tile encoder and a slide-aggregator fully in memory and end-to-end at high-resolution, bridging the gap between input and slide-level supervision. While more computationally expensive, detailed quantitative validation shows promise for large-scale pre-training of pathology foundation models.

Via

Access Paper or Ask Questions

Computational Pathology at Health System Scale -- Self-Supervised Foundation Models from Three Billion Images

Oct 10, 2023

Gabriele Campanella, Ricky Kwan, Eugene Fluder, Jennifer Zeng, Aryeh Stock, Brandon Veremis, Alexandros D. Polydorides, Cyrus Hedvat, Adam Schoenfeld, Chad Vanderbilt(+3 more)

Abstract:Recent breakthroughs in self-supervised learning have enabled the use of large unlabeled datasets to train visual foundation models that can generalize to a variety of downstream tasks. While this training paradigm is well suited for the medical domain where annotations are scarce, large-scale pre-training in the medical domain, and in particular pathology, has not been extensively studied. Previous work in self-supervised learning in pathology has leveraged smaller datasets for both pre-training and evaluating downstream performance. The aim of this project is to train the largest academic foundation model and benchmark the most prominent self-supervised learning algorithms by pre-training and evaluating downstream performance on large clinical pathology datasets. We collected the largest pathology dataset to date, consisting of over 3 billion images from over 423 thousand microscopy slides. We compared pre-training of visual transformer models using the masked autoencoder (MAE) and DINO algorithms. We evaluated performance on six clinically relevant tasks from three anatomic sites and two institutions: breast cancer detection, inflammatory bowel disease detection, breast cancer estrogen receptor prediction, lung adenocarcinoma EGFR mutation prediction, and lung cancer immunotherapy response prediction. Our results demonstrate that pre-training on pathology data is beneficial for downstream performance compared to pre-training on natural images. Additionally, the DINO algorithm achieved better generalization performance across all tasks tested. The presented results signify a phase change in computational pathology research, paving the way into a new era of more performant models based on large-scale, parallel pre-training at the billion-image scale.

Via

Access Paper or Ask Questions

Deep conditional transformation models for survival analysis

Oct 20, 2022

Gabriele Campanella, Lucas Kook, Ida Häggström, Torsten Hothorn, Thomas J. Fuchs

Figure 1 for Deep conditional transformation models for survival analysis

Figure 2 for Deep conditional transformation models for survival analysis

Figure 3 for Deep conditional transformation models for survival analysis

Figure 4 for Deep conditional transformation models for survival analysis

Abstract:An every increasing number of clinical trials features a time-to-event outcome and records non-tabular patient data, such as magnetic resonance imaging or text data in the form of electronic health records. Recently, several neural-network based solutions have been proposed, some of which are binary classifiers. Parametric, distribution-free approaches which make full use of survival time and censoring status have not received much attention. We present deep conditional transformation models (DCTMs) for survival outcomes as a unifying approach to parametric and semiparametric survival analysis. DCTMs allow the specification of non-linear and non-proportional hazards for both tabular and non-tabular data and extend to all types of censoring and truncation. On real and semi-synthetic data, we show that DCTMs compete with state-of-the-art DL approaches to survival analysis.

Via

Access Paper or Ask Questions

H&E-based Computational Biomarker Enables Universal EGFR Screening for Lung Adenocarcinoma

Jun 21, 2022

Gabriele Campanella, David Ho, Ida Häggström, Anton S Becker, Jason Chang, Chad Vanderbilt, Thomas J Fuchs

Figure 1 for H&E-based Computational Biomarker Enables Universal EGFR Screening for Lung Adenocarcinoma

Figure 2 for H&E-based Computational Biomarker Enables Universal EGFR Screening for Lung Adenocarcinoma

Figure 3 for H&E-based Computational Biomarker Enables Universal EGFR Screening for Lung Adenocarcinoma

Figure 4 for H&E-based Computational Biomarker Enables Universal EGFR Screening for Lung Adenocarcinoma

Abstract:Lung cancer is the leading cause of cancer death worldwide, with lung adenocarcinoma being the most prevalent form of lung cancer. EGFR positive lung adenocarcinomas have been shown to have high response rates to TKI therapy, underlying the essential nature of molecular testing for lung cancers. Despite current guidelines consider testing necessary, a large portion of patients are not routinely profiled, resulting in millions of people not receiving the optimal treatment for their lung cancer. Sequencing is the gold standard for molecular testing of EGFR mutations, but it can take several weeks for results to come back, which is not ideal in a time constrained scenario. The development of alternative screening tools capable of detecting EGFR mutations quickly and cheaply while preserving tissue for sequencing could help reduce the amount of sub-optimally treated patients. We propose a multi-modal approach which integrates pathology images and clinical variables to predict EGFR mutational status achieving an AUC of 84% on the largest clinical cohort to date. Such a computational model could be deployed at large at little additional cost. Its clinical application could reduce the number of patients who receive sub-optimal treatments by 53.1% in China, and up to 96.6% in the US.

Via

Access Paper or Ask Questions

Towards Unsupervised Cancer Subtyping: Predicting Prognosis Using A Histologic Visual Dictionary

Mar 12, 2019

Hassan Muhammad, Carlie S. Sigel, Gabriele Campanella, Thomas Boerner, Linda M. Pak, Stefan Büttner, Jan N. M. IJzermans, Bas Groot Koerkamp, Michael Doukas, William R. Jarnagin(+2 more)

Figure 1 for Towards Unsupervised Cancer Subtyping: Predicting Prognosis Using A Histologic Visual Dictionary

Figure 2 for Towards Unsupervised Cancer Subtyping: Predicting Prognosis Using A Histologic Visual Dictionary

Figure 3 for Towards Unsupervised Cancer Subtyping: Predicting Prognosis Using A Histologic Visual Dictionary

Figure 4 for Towards Unsupervised Cancer Subtyping: Predicting Prognosis Using A Histologic Visual Dictionary

Abstract:Unlike common cancers, such as those of the prostate and breast, tumor grading in rare cancers is difficult and largely undefined because of small sample sizes, the sheer volume of time needed to undertake on such a task, and the inherent difficulty of extracting human-observed patterns. One of the most challenging examples is intrahepatic cholangiocarcinoma (ICC), a primary liver cancer arising from the biliary system, for which there is well-recognized tumor heterogeneity and no grading paradigm or prognostic biomarkers. In this paper, we propose a new unsupervised deep convolutional autoencoder-based clustering model that groups together cellular and structural morphologies of tumor in 246 ICC digitized whole slides, based on visual similarity. From this visual dictionary of histologic patterns, we use the clusters as covariates to train Cox-proportional hazard survival models. In univariate analysis, three clusters were significantly associated with recurrence-free survival. Combinations of these clusters were significant in multivariate analysis. In a multivariate analysis of all clusters, five showed significance to recurrence-free survival, however the overall model was not measured to be significant. Finally, a pathologist assigned clinical terminology to the significant clusters in the visual dictionary and found evidence supporting the hypothesis that collagen-enriched fibrosis plays a role in disease severity. These results offer insight into the future of cancer subtyping and show that computational pathology can contribute to disease prognostication, especially in rare cancers.

* 10 pages, 6 figures

Via

Access Paper or Ask Questions

Terabyte-scale Deep Multiple Instance Learning for Classification and Localization in Pathology

Sep 27, 2018

Gabriele Campanella, Vitor Werneck Krauss Silva, Thomas J. Fuchs

Figure 1 for Terabyte-scale Deep Multiple Instance Learning for Classification and Localization in Pathology

Figure 2 for Terabyte-scale Deep Multiple Instance Learning for Classification and Localization in Pathology

Figure 3 for Terabyte-scale Deep Multiple Instance Learning for Classification and Localization in Pathology

Figure 4 for Terabyte-scale Deep Multiple Instance Learning for Classification and Localization in Pathology

Abstract:In the field of computational pathology, the use of decision support systems powered by state-of-the-art deep learning solutions has been hampered by the lack of large labeled datasets. Until recently, studies relied on datasets in the order of few hundreds of slides which are not enough to train a model that can work at scale in the clinic. Here, we have gathered a dataset consisting of 12,160 slides, two orders of magnitude larger than previous datasets in pathology and equivalent to 25 times the pixel count of the entire ImageNet dataset. Given the size of our dataset it is possible for us to train a deep learning model under the Multiple Instance Learning (MIL) assumption where only the overall slide diagnosis is necessary for training, avoiding all the expensive pixel-wise annotations that are usually part of supervised learning approaches. We test our framework on a complex task, that of prostate cancer diagnosis on needle biopsies. We performed a thorough evaluation of the performance of our MIL pipeline under several conditions achieving an AUC of 0.98 on a held-out test set of 1,824 slides. These results open the way for training accurate diagnosis prediction models at scale, laying the foundation for decision support system deployment in the clinic.

Via

Access Paper or Ask Questions

DeepPET: A deep encoder-decoder network for directly solving the PET reconstruction inverse problem

Sep 25, 2018

Ida Häggström, C. Ross Schmidtlein, Gabriele Campanella, Thomas J. Fuchs

Figure 1 for DeepPET: A deep encoder-decoder network for directly solving the PET reconstruction inverse problem

Figure 2 for DeepPET: A deep encoder-decoder network for directly solving the PET reconstruction inverse problem

Figure 3 for DeepPET: A deep encoder-decoder network for directly solving the PET reconstruction inverse problem

Figure 4 for DeepPET: A deep encoder-decoder network for directly solving the PET reconstruction inverse problem

Abstract:Positron emission tomography (PET) is a cornerstone of modern radiology. The ability to detect cancer and metastases in whole body scans fundamentally changed cancer diagnosis and treatment. One of the main bottlenecks in the clinical application is the time it takes to reconstruct the anatomical image from the deluge of data in PET imaging. State-of-the art methods based on expectation maximization can take hours for a single patient and depend on manual fine-tuning. This results not only in financial burden for hospitals but more importantly leads to less efficient patient handling, evaluation, and ultimately diagnosis and treatment for patients. To overcome this problem we present a novel PET image reconstruction technique based on a deep convolutional encoder-decoder network, that takes PET sinogram data as input and directly outputs full PET images. Using realistic simulated data, we demonstrate that our network is able to reconstruct images >100 times faster, and with comparable image quality (in terms of root mean squared error) relative to conventional iterative reconstruction techniques.

Via

Access Paper or Ask Questions