Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dmitry Nechaev

HISTAI: An Open-Source, Large-Scale Whole Slide Image Dataset for Computational Pathology

May 17, 2025

Dmitry Nechaev, Alexey Pchelnikov, Ekaterina Ivanova

Abstract:Recent advancements in Digital Pathology (DP), particularly through artificial intelligence and Foundation Models, have underscored the importance of large-scale, diverse, and richly annotated datasets. Despite their critical role, publicly available Whole Slide Image (WSI) datasets often lack sufficient scale, tissue diversity, and comprehensive clinical metadata, limiting the robustness and generalizability of AI models. In response, we introduce the HISTAI dataset, a large, multimodal, open-access WSI collection comprising over 60,000 slides from various tissue types. Each case in the HISTAI dataset is accompanied by extensive clinical metadata, including diagnosis, demographic information, detailed pathological annotations, and standardized diagnostic coding. The dataset aims to fill gaps identified in existing resources, promoting innovation, reproducibility, and the development of clinically relevant computational pathology solutions. The dataset can be accessed at https://github.com/HistAI/HISTAI.

Via

Access Paper or Ask Questions

SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models

Mar 04, 2025

Dmitry Nechaev, Alexey Pchelnikov, Ekaterina Ivanova

Figure 1 for SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models

Figure 2 for SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models

Figure 3 for SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models

Figure 4 for SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models

Abstract:Advancing AI in computational pathology requires large, high-quality, and diverse datasets, yet existing public datasets are often limited in organ diversity, class coverage, or annotation quality. To bridge this gap, we introduce SPIDER (Supervised Pathology Image-DEscription Repository), the largest publicly available patch-level dataset covering multiple organ types, including Skin, Colorectal, and Thorax, with comprehensive class coverage for each organ. SPIDER provides high-quality annotations verified by expert pathologists and includes surrounding context patches, which enhance classification performance by providing spatial context. Alongside the dataset, we present baseline models trained on SPIDER using the Hibou-L foundation model as a feature extractor combined with an attention-based classification head. The models achieve state-of-the-art performance across multiple tissue categories and serve as strong benchmarks for future digital pathology research. Beyond patch classification, the model enables rapid identification of significant areas, quantitative tissue metrics, and establishes a foundation for multimodal approaches. Both the dataset and trained models are publicly available to advance research, reproducibility, and AI-driven pathology development. Access them at: https://github.com/HistAI/SPIDER

Via

Access Paper or Ask Questions

Hibou: A Family of Foundational Vision Transformers for Pathology

Jun 07, 2024

Dmitry Nechaev, Alexey Pchelnikov, Ekaterina Ivanova

Figure 1 for Hibou: A Family of Foundational Vision Transformers for Pathology

Figure 2 for Hibou: A Family of Foundational Vision Transformers for Pathology

Figure 3 for Hibou: A Family of Foundational Vision Transformers for Pathology

Figure 4 for Hibou: A Family of Foundational Vision Transformers for Pathology

Abstract:Pathology, the microscopic examination of diseased tissue, is critical for diagnosing various medical conditions, particularly cancers. Traditional methods are labor-intensive and prone to human error. Digital pathology, which converts glass slides into high-resolution digital images for analysis by computer algorithms, revolutionizes the field by enhancing diagnostic accuracy, consistency, and efficiency through automated image analysis and large-scale data processing. Foundational transformer pretraining is crucial for developing robust, generalizable models as it enables learning from vast amounts of unannotated data. This paper introduces the Hibou family of foundational vision transformers for pathology, leveraging the DINOv2 framework to pretrain two model variants, Hibou-B and Hibou-L, on a proprietary dataset of over 1 million whole slide images (WSIs) representing diverse tissue types and staining techniques. Our pretrained models demonstrate superior performance on both patch-level and slide-level benchmarks, surpassing existing state-of-the-art methods. Notably, Hibou-L achieves the highest average accuracy across multiple benchmark datasets. To support further research and application in the field, we have open-sourced the Hibou-B model, which can be accessed at https://github.com/HistAI/hibou

Via

Access Paper or Ask Questions