Density reconstruction from X-ray projections is an important problem in radiography with key applications in scientific and industrial X-ray computed tomography (CT). Often, such projections are corrupted by unknown sources of noise and scatter, which when not properly accounted for, can lead to significant errors in density reconstruction. In the setting of this problem, recent deep learning-based methods have shown promise in improving the accuracy of density reconstruction. In this article, we propose a deep learning-based encoder-decoder framework wherein the encoder extracts robust features from noisy/corrupted X-ray projections and the decoder reconstructs the density field from the features extracted by the encoder. We explore three options for the latent-space representation of features: physics-inspired supervision, self-supervision, and no supervision. We find that variants based on self-supervised and physicsinspired supervised features perform better over a range of unknown scatter and noise. In extreme noise settings, the variant with self-supervised features performs best. After investigating further details of the proposed deep-learning methods, we conclude by demonstrating that the newly proposed methods are able to achieve higher accuracy in density reconstruction when compared to a traditional iterative technique.