In the field of deep learning based computer vision, the development of deep object detection has led to unique paradigms (e.g., two-stage or set-based) and architectures (e.g., Faster-RCNN or DETR) which enable outstanding performance on challenging benchmark datasets. Despite this, the trained object detectors typically do not reliably assess uncertainty regarding their own knowledge, and the quality of their probabilistic predictions is usually poor. As these are often used to make subsequent decisions, such inaccurate probabilistic predictions must be avoided. In this work, we investigate the uncertainty calibration properties of different pretrained object detection architectures in a multi-class setting. We propose a framework to ensure a fair, unbiased, and repeatable evaluation and conduct detailed analyses assessing the calibration under distributional changes (e.g., distributional shift and application to out-of-distribution data). Furthermore, by investigating the influence of different detector paradigms, post-processing steps, and suitable choices of metrics, we deliver novel insights into why poor detector calibration emerges. Based on these insights, we are able to improve the calibration of a detector by simply finetuning its last layer.