Abstract:This paper presents a novel method for discovering systematic errors in segmentation models. For instance, a systematic error in the segmentation model can be a sufficiently large number of misclassifications from the model as a parking meter for a target class of pedestrians. With the rapid deployment of these models in critical applications such as autonomous driving, it is vital to detect and interpret these systematic errors. However, the key challenge is automatically discovering such failures on unlabelled data and forming interpretable semantic sub-groups for intervention. For this, we leverage multimodal foundation models to retrieve errors and use conceptual linkage along with erroneous nature to study the systematic nature of these errors. We demonstrate that such errors are present in SOTA segmentation models (UperNet ConvNeXt and UperNet Swin) trained on the Berkeley Deep Drive and benchmark the approach qualitatively and quantitatively, showing its effectiveness by discovering coherent systematic errors for these models. Our work opens up the avenue to model analysis and intervention that have so far been underexplored in semantic segmentation.