Metal artefacts in CT images may disrupt image quality and interfere with diagnosis. Recently many deep-learning-based CT metal artefact reduction (MAR) methods have been proposed. Current deep MAR methods may be troubled with domain gap problem, where methods trained on simulated data cannot perform well on practical data. In this work, we experimentally investigate two image-domain supervised methods, two dual-domain supervised methods and two image-domain unsupervised methods on a dental dataset and a torso dataset, to explore whether domain gap problem exists or is overcome. We find that I-DL-MAR and DudoNet are effective for practical data of the torso dataset, indicating the domain gap problem is solved. However, none of the investigated methods perform satisfactorily on practical data of the dental dataset. Based on the experimental results, we further analyze the causes of domain gap problem for each method and dataset, which may be beneficial for improving existing methods or designing new ones. The findings suggest that the domain gap problem in deep MAR methods remains to be addressed.