The implementation of concrete slab track solutions has been recently increasing particularly for high-speed lines. While it is typically associated with low periodic maintenance, there is a significant need to detect the state of slab tracks in an efficient way. Data-driven detection methods are promising. However, collecting large amounts of labeled data is particularly challenging since abnormal states are rare for such safety-critical infrastructure. To imitate different healthy and unhealthy states of slab tracks, this study uses three types of slab track supporting conditions in a railway test line. Acceleration sensors (contact) and acoustic sensors (contactless), are installed next to the three types of slab track to collect the acceleration and acoustic signals as a train passes by with different speeds. We use a deep learning framework based on the recently proposed Denoising Sparse Wavelet Network (DeSpaWN) to automatically learn meaningful and sparse representations of raw high-frequency signals. A comparative study is conducted among the feature learning / extraction methods, and between acceleration signals and acoustic signals, by evaluating the detection effectiveness using a multi-class support vector machine. It is found that the classification accuracy using acceleration signals can reach almost 100%, irrespective which feature learning / extraction method is adopted. Due to the more severe noise interference in acoustic signals, the performance of using acoustic signals is worse than of using acceleration signals. However, it can be significantly improved by leaning meaningful features with DeSpaWN.