Abstract:The kidney biopsy is the gold standard for the diagnosis of kidney diseases. Lesion scores made by expert renal pathologists are semi-quantitative and suffer from high inter-observer variability. Automatically obtaining statistics per segmented anatomical object, therefore, can bring significant benefits in reducing labor and this inter-observer variability. Instance segmentation for a biopsy, however, has been a challenging problem due to (a) the on average large number (around 300 to 1000) of densely touching anatomical structures, (b) with multiple classes (at least 3) and (c) in different sizes and shapes. The currently used instance segmentation models cannot simultaneously deal with these challenges in an efficient yet generic manner. In this paper, we propose the first anchor-free instance segmentation model that combines diffusion models, transformer modules, and RCNNs (regional convolution neural networks). Our model is trained on just one NVIDIA GeForce RTX 3090 GPU, but can efficiently recognize more than 500 objects with 3 common anatomical object classes in renal biopsies, i.e., glomeruli, tubuli, and arteries. Our data set consisted of 303 patches extracted from 148 Jones' silver-stained renal whole slide images (WSIs), where 249 patches were used for training and 54 patches for evaluation. In addition, without adjustment or retraining, the model can directly transfer its domain to generate decent instance segmentation results from PAS-stained WSIs. Importantly, it outperforms other baseline models and reaches an AP 51.7% in detection as the new state-of-the-art.