Short-term passenger demand forecasting is of great importance to the on-demand ride service platform, which can incentivize vacant cars moving from over-supply regions to over-demand regions. The spatial dependences, temporal dependences, and exogenous dependences need to be considered simultaneously, however, which makes short-term passenger demand forecasting challenging. We propose a novel deep learning (DL) approach, named the fusion convolutional long short-term memory network (FCL-Net), to address these three dependences within one end-to-end learning architecture. The model is stacked and fused by multiple convolutional long short-term memory (LSTM) layers, standard LSTM layers, and convolutional layers. The fusion of convolutional techniques and the LSTM network enables the proposed DL approach to better capture the spatio-temporal characteristics and correlations of explanatory variables. A tailored spatially aggregated random forest is employed to rank the importance of the explanatory variables. The ranking is then used for feature selection. The proposed DL approach is applied to the short-term forecasting of passenger demand under an on-demand ride service platform in Hangzhou, China. Experimental results, validated on real-world data provided by DiDi Chuxing, show that the FCL-Net achieves better predictive performance than traditional approaches including both classical time-series prediction models and neural network based algorithms (e.g., artificial neural network and LSTM). This paper is one of the first DL studies to forecast the short-term passenger demand of an on-demand ride service platform by examining the spatio-temporal correlations.