Transformers have become the de-facto standard in the natural language processing (NLP) field. They have also gained momentum in computer vision and other domains. Transformers can enable artificial intelligence (AI) models to dynamically focus on certain parts of their input and thus reason more effectively. Inspired by the success of transformers, we adopted this technique to predict strategic flight departure demand in multiple horizons. This work was conducted in support of a MITRE-developed mobile application, Pacer, which displays predicted departure demand to general aviation (GA) flight operators so they can have better situation awareness of the potential for departure delays during busy periods. Field demonstrations involving Pacer's previously designed rule-based prediction method showed that the prediction accuracy of departure demand still has room for improvement. This research strives to improve prediction accuracy from two key aspects: better data sources and robust forecasting algorithms. We leveraged two data sources, Aviation System Performance Metrics (ASPM) and System Wide Information Management (SWIM), as our input. We then trained forecasting models with temporal fusion transformer (TFT) for five different airports. Case studies show that TFTs can perform better than traditional forecasting methods by large margins, and they can result in better prediction across diverse airports and with better interpretability.