The large amount of tourism-related data presents a series of challenges for tourism demand forecasting, including data deficiencies, multicollinearity and long calculation times. A bagging-based multivariate ensemble deep learning approach integrating stacked autoencoders and kernel-based extreme learning machines (B-SAKE) is proposed to address these challenges in this study. We forecast tourist arrivals in Beijing from four countries by adopting historical data on tourist arrivals in Beijing, economic indicators and online tourist behavior variables. The results from the cases of four origin countries suggest that our proposed B-SAKE approach outperforms than benchmark models in terms of horizontal accuracy, directional accuracy and statistical significance. Both bagging and stacked autoencoder can improve the forecasting performance of the models. Moreover, the forecasting performance of the models is evaluated with consistent results by means of the multi-step-ahead forecasting scheme.