Semantic segmentation from aerial views is a vital task for autonomous drones as they require precise and accurate segmentation to traverse safely and efficiently. Segmenting images from aerial views is especially challenging as they include diverse view-points, extreme scale variation and high scene complexity. To address this problem, we propose an end-to-end multi-class semantic segmentation diffusion model. We introduce recursive denoising which allows predicted error to propagate through the denoising process. In addition, we combine this with a hierarchical multi-scale approach, complementary to the diffusion process. Our method achieves state-of-the-art results on UAVid and on the Vaihingen building segmentation benchmark.