Data-dependent superimposed training (DDST) scheme has shown the potential to achieve high bandwidth efficiency, while encounters symbol misidentification caused by hardware imperfection. To tackle these challenges, a joint model and data driven receiver scheme is proposed in this paper. Specifically, based on the conventional linear receiver model, the least squares (LS) estimation and zero forcing (ZF) equalization are first employed to extract the initial features for channel estimation and data detection. Then, shallow neural networks, named CE-Net and SD-Net, are developed to refine the channel estimation and data detection, where the imperfect hardware is modeled as a nonlinear function and data is utilized to train these neural networks to approximate it. Simulation results show that compared with the conventional minimum mean square error (MMSE) equalization scheme, the proposed one effectively suppresses the symbol misidentification and achieves similar or better bit error rate (BER) performance without the second-order statistics about the channel and noise.