https://github.com/MVME-HBUT/SDD-FTI-FDet}.
Despite the successful application of convolutional neural networks (CNNs) in object detection tasks, their efficiency in detecting faults from freight train images remains inadequate for implementation in real-world engineering scenarios. Existing modeling shortcomings of spatial invariance and pooling layers in conventional CNNs often ignore the neglect of crucial global information, resulting in error localization for fault objection tasks of freight trains. To solve these problems, we design a spatial-wise dynamic distillation framework based on multi-layer perceptron (MLP) for visual fault detection of freight trains. We initially present the axial shift strategy, which allows the MLP-like architecture to overcome the challenge of spatial invariance and effectively incorporate both local and global cues. We propose a dynamic distillation method without a pre-training teacher, including a dynamic teacher mechanism that can effectively eliminate the semantic discrepancy with the student model. Such an approach mines more abundant details from lower-level feature appearances and higher-level label semantics as the extra supervision signal, which utilizes efficient instance embedding to model the global spatial and semantic information. In addition, the proposed dynamic teacher can jointly train with students to further enhance the distillation efficiency. Extensive experiments executed on six typical fault datasets reveal that our approach outperforms the current state-of-the-art detectors and achieves the highest accuracy with real-time detection at a lower computational cost. The source code will be available at \url{