Seismic data fault detection has recently been regarded as a 3D image segmentation task. The nature of fault structures in seismic image makes it difficult to manually label faults. Manual labeling often has many false negative labels (abnormal annotations), which will seriously jeopardize the training process. In this work, we find that region-based loss significantly outperforms distribution-based loss when dealing with false negative labels, therefore we proposed Mask Dice loss (MD loss), which is the first reported region-based loss function for training 3D image segmentation networks using sparse 2D slice labels. In addition, fault is an edge feature, and the current network widely used for fault segmentation downsamples the features multiple times, which is not conducive to edge representation and thus requires many parameters and computational effort to preserve the features. We proposed Fault-Net, which uses a high-resolution and shallow structure to propagate multi-scale features in parallel, fully preserving edge features. Meanwhile, in order to efficiently fuse multiscale features, we decouple the convolution process into feature selection and channel fusion, and proposed a lightweight feature fusion block, Multi-Scale Compression Fusion (MCF). Because the Fault-Net always keeps the edge features during propagation, only few parameters and computation are required. Experimental results show that MD loss can clearly weaken the effect of false negative labels. The Fault-Net parameter is only 0.42MB, support up to 528^3(1.5x10^8, Float32) size cuboid inference on 16GB video ram, its inference speed on CPU and GPU is significantly faster than other networks. It works well on most of the open data seismic images, and the result of our approach is state-ofthe-art in FORCE fault identification competition.