In this paper, we propose a quality enhancement network for Versatile Video Coding (VVC) compressed videos by jointly exploiting spatial details and temporal structure (SDTS). The network consists of a temporal structure prediction subnet and a spatial detail enhancement subnet. The former subnet is used to estimate and compensate the temporal motion across frames, and the spatial detail subnet is used to reduce the compression artifacts and enhance the reconstruction quality of the VVC compressed video. Experimental results demonstrate the effectiveness of our SDTS-based approach. It offers over 7.82$\%$ BD-rate saving on the common test video sequences and achieves the state-of-the-art performance.