Deep learning-based models, such as convolutional neural networks, have advanced various segments of computer vision. However, this technology is rarely applied to seismic shot gather noise localization problem. This letter presents an investigation on the effectiveness of a multi-scale feature-fusion-based network for seismic shot-gather noise localization. Herein, we describe the following: (1) the construction of a real-world dataset of seismic noise localization based on 6,500 seismograms; (2) a multi-scale feature-fusion-based detector that uses the MobileNet combined with the Feature Pyramid Net as the backbone; and (3) the Single Shot multi-box detector for box classification/regression. Additionally, we propose the use of the Focal Loss function that improves the detector's prediction accuracy. The proposed detector achieves an AP@0.5 of 78.67\% in our empirical evaluation.