Due to its high delay resolution, the ultra-wideband (UWB) technique has been widely adopted for fine-grained indoor localization. Instead of active positioning, multi-static UWB radar-based passive human tracking is explored using commercial off-the-shelf (COTS) devices. To extract the time-of-flight (ToF) reflected by the moving person, channel impulse responses (CIR) and the corresponding variances are used to train the convolutional neural networks (CNN) model. Particle filter algorithm is adopted to track the moving person based on the extracted ToFs of all pairs of links. Experimental results show that the proposed CIR- and variance-based CNN models achieve 30.12-cm and 29.04-cm root-mean-square errors (RMSEs), respectively. Especially, the variance-based CNN model is robust to the scenario changing and promising for practical applications.