Deep unfolding methods and transformer architecture have recently shown promising results in hyperspectral image (HSI) reconstruction. However, there still exist two issues: (1) in the data subproblem, most methods represents the stepsize utilizing a learnable parameter. Nevertheless, for different spectral channel, error between features and ground truth is unequal. (2) Transformer struggles to balance receptive field size with pixel-wise detail information. To overcome the aforementioned drawbacks, We proposed an adaptive step-size perception unfolding network (ASPUN), a deep unfolding network based on FISTA algorithm, which uses an adaptive step-size perception module to estimate the update step-size of each spectral channel. In addition, we design a Non-local Hybrid Attention Transformer(NHAT) module for fully leveraging the receptive field advantage of transformer. By plugging the NLHA into the Non-local Information Aggregation (NLIA) module, the unfolding network can achieve better reconstruction results. Experimental results show that our ASPUN is superior to the existing SOTA algorithms and achieves the best performance.