Medical imaging is nowadays a pillar in diagnostics and therapeutic follow-up. Current research tries to integrate established - but ionizing - tomographic techniques with technologies offering reduced radiation exposure. Diffuse Optical Tomography (DOT) uses non-ionizing light in the Near-Infrared (NIR) window to reconstruct optical coefficients in living beings, providing functional indications about the composition of the investigated organ/tissue. Due to predominant light scattering at NIR wavelengths, DOT reconstruction is, however, a severely ill-conditioned inverse problem. Conventional reconstruction approaches show severe weaknesses when dealing also with mildly complex cases and/or are computationally very intensive. In this work we explore deep learning techniques for DOT inversion. Namely, we propose a fully data-driven approach based on a modularity concept: first data and originating signal are separately processed via autoencoders, then the corresponding low-dimensional latent spaces are connected via a bridging network which acts at the same time as a learned regularizer.