Fluorescence lifetime imaging (FLI) has been receiving increased attention in recent years as a powerful imaging technique in biological and medical research. However, existing FLI systems often suffer from a tradeoff between processing speed, accuracy, and robustness. In this paper, we propose a SPAD TCSPC system coupled to a recurrent neural network (RNN) for FLI that accurately estimates on the fly fluorescence lifetime directly from raw timestamps instead of histograms, which drastically reduces the data transfer rate and hardware resource utilization. We train two variants of the RNN on a synthetic dataset and compare the results to those obtained using the center-of-mass method (CMM) and least squares fitting (LS fitting) methods. The results demonstrate that two RNN variants, gated recurrent unit (GRU) and long short-term memory (LSTM), are comparable to CMM and LS fitting in terms of accuracy and outperform CMM and LS fitting by a large margin in the presence of background noise. We also look at the Cramer-Rao lower bound and detailed analysis showed that the RNN models are close to the theoretical optima. The analysis of experimental data shows that our model, which is purely trained on synthetic datasets, works well on real-world data. We build a FLI microscope setup for evaluation based on Piccolo, a 32$\times$32 SPAD sensor developed in our lab. Four quantized GRU cores, capable of processing up to 4 million photons per second, are deployed on a Xilinx Kintex-7 FPGA. Powered by the GRU, the FLI setup can retrieve real-time fluorescence lifetime images at up to 10 frames per second. The proposed FLI system is promising for many important biomedical applications, ranging from biological imaging of fast-moving cells to fluorescence-assisted diagnosis and surgery.