The Coded Aperture Snapshot Spectral Compressive Imaging (CASSI) system modulates three-dimensional hyperspectral images into two-dimensional compressed images in a single exposure. Subsequently, three-dimensional hyperspectral images (HSI) can be reconstructed from the two-dimensional compressed measurements using reconstruction algorithms. Among these methods, deep unfolding techniques have demonstrated excellent performance, with RDLUF-MixS^2 achieving the best reconstruction results. However, RDLUF-MixS^2 requires extensive training time, taking approximately 14 days to train RDLUF-MixS^2-9stg on a single RTX 3090 GPU, making it computationally expensive. Furthermore, RDLUF-MixS^2 performs poorly on real data, resulting in significant artifacts in the reconstructed images. In this study, we introduce the Dense-spatial Spectral-attention Transformer (DST) into the Proximal Gradient Descent Unfolding Framework (PGDUF), creating a novel approach called Proximal Gradient Descent Unfolding Dense-spatial Spectral-attention Transformer (PGDUDST). Compared to RDLUF-MixS^2, PGDUDST not only surpasses the network reconstruction performance limit of RDLUF-MixS^2 but also achieves faster convergence. PGDUDST requires only 58% of the training time of RDLUF-MixS^2-9stg to achieve comparable reconstruction results. Additionally, PGDUDST significantly alleviates the artifact issues caused by RDLUF-MixS^2 in real experimental data, demonstrating superior performance and producing clearer reconstructed images.