Seismic acquisition footprints appear as stably faint and dim structures and emerge fully spatially coherent, causing inevitable damage to useful signals during the suppression process. Various footprint removal methods, including filtering and sparse representation (SR), have been reported to attain promising results for surmounting this challenge. However, these methods, e.g., SR, rely solely on the handcrafted image priors of useful signals, which is sometimes an unreasonable demand if complex geological structures are contained in the given seismic data. As an alternative, this article proposes a footprint removal network (dubbed FR-Net) for the unsupervised suppression of acquired footprints without any assumptions regarding valuable signals. The key to the FR-Net is to design a unidirectional total variation (UTV) model for footprint acquisition according to the intrinsically directional property of noise. By strongly regularizing a deep convolutional autoencoder (DCAE) using the UTV model, our FR-Net transforms the DCAE from an entirely data-driven model to a \textcolor{black}{prior-augmented} approach, inheriting the superiority of the DCAE and our footprint model. Subsequently, the complete separation of the footprint noise and useful signals is projected in an unsupervised manner, specifically by optimizing the FR-Net via the backpropagation (BP) algorithm. We provide qualitative and quantitative evaluations conducted on three synthetic and field datasets, demonstrating that our FR-Net surpasses the previous state-of-the-art (SOTA) methods.