Muon scattering tomography utilises muons, typically originating from cosmic rays to image the interiors of dense objects. However, due to the low flux of cosmic ray muons at sea-level and the highly complex interactions that muons display when travelling through matter, existing reconstruction algorithms often suffer from low resolution and high noise. In this work, we develop a novel two-stage deep learning algorithm, $\mu$-Net, consisting of an MLP to predict the muon trajectory and a ConvNeXt-based U-Net to convert the scattering points into voxels. $\mu$-Net achieves a state-of-the-art performance of 17.14 PSNR at the dosage of 1024 muons, outperforming traditional reconstruction algorithms such as the point of closest approach algorithm and maximum likelihood and expectation maximisation algorithm. Furthermore, we find that our method is robust to various corruptions such as inaccuracies in the muon momentum or a limited detector resolution. We also generate and publicly release the first large-scale dataset that maps muon detections to voxels. We hope that our research will spark further investigations into the potential of deep learning to revolutionise this field.