This paper proposes a combined network structure between convolutional neural network (CNN) and long-short term memory (LSTM) quantifier for WiFi fingerprinting indoor localization. In contrast to conventional methods that utilize only spatial data with classification models, our CNN-LSTM network extracts both space and time features of the received channel state information (CSI) from a single router. Furthermore, the proposed network builds a quantification model rather than a limited classification model as in most of the literature work, which enables the estimation of testing points that are not identical to the reference points. We analyze the instability of CSI and demonstrate a mitigation solution using a comprehensive filter and normalization scheme. The localization accuracy is investigated through extensive on-site experiments with several mobile devices including mobile phone (Nexus 5) and laptop (Intel 5300 NIC) on hundreds of testing locations. Using only a single WiFi router, our structure achieves an average localization error of 2.5~m with $\mathrm{80\%}$ of the errors under 4~m, which outperforms the other reported algorithms by approximately $\mathrm{50\%}$ under the same test environment.