Traditional snapshot hyperspectral imaging systems generally require multiple refractive-optics-based elements to modulate light, resulting in bulky framework. In pursuit of a more compact form factor, a metasurface-based snapshot hyperspectral imaging system, which achieves joint optimization of metasurface and image processing, is proposed in this paper. The unprecedented light manipulation capabilities of metasurfaces are used in conjunction with neural networks to encode and decode light fields for better hyperspectral imaging. Specifically, the extremely strong dispersion of metasurfaces is exploited to distinguish spectral information, and a neural network based on spectral priors is applied for hyperspectral image reconstruction. By constructing a fully differentiable model of metasurface-based hyperspectral imaging, the front-end metasurface phase distribution and the back-end recovery network parameters can be jointly optimized. This method achieves high-quality hyperspectral reconstruction results numerically, outperforming separation optimization methods. The proposed system holds great potential for miniaturization and portability of hyperspectral imaging systems.