Pseudo-LiDAR point cloud interpolation is a novel and challenging task in the field of autonomous driving, which aims to address the frequency mismatching problem between camera and LiDAR. Previous works represent the 3D spatial motion relationship induced by a coarse 2D optical flow, and the quality of interpolated point clouds only depends on the supervision of depth maps. As a result, the generated point clouds suffer from inferior global distributions and local appearances. To solve the above problems, we propose a Pseudo-LiDAR point cloud interpolation network to generates temporally and spatially high-quality point cloud sequences. By exploiting the scene flow between point clouds, the proposed network is able to learn a more accurate representation of the 3D spatial motion relationship. For the more comprehensive perception of the distribution of point cloud, we design a novel reconstruction loss function that implements the chamfer distance to supervise the generation of Pseudo-LiDAR point clouds in 3D space. In addition, we introduce a multi-modal deep aggregation module to facilitate the efficient fusion of texture and depth features. As the benefits of the improved motion representation, training loss function, and model structure, our approach gains significant improvements on the Pseudo-LiDAR point cloud interpolation task. The experimental results evaluated on KITTI dataset demonstrate the state-of-the-art performance of the proposed network, quantitatively and qualitatively.