A novel Twitter context aided content caching (TAC) framework is proposed for enhancing the caching efficiency by taking advantage of the legibility and massive volume of Twitter data. For the purpose of promoting the caching efficiency, three machine learning models are proposed to predict latent events and events popularity, utilizing collect Twitter data with geo-tags and geographic information of the adjacent base stations (BSs). Firstly, we propose a latent Dirichlet allocation (LDA) model for latent events forecasting taking advantage of the superiority of the LDA model in natural language processing (NLP). Then, we conceive long short-term memory (LSTM) with skip-gram embedding approach and LSTM with continuous skip-gram-Geo-aware embedding approach for the events popularity forecasting. Lastly, we associate the predicted latent events and the popularity of the events with the caching strategy. Extensive practical experiments demonstrate that: (1) The proposed TAC framework outperforms the conventional caching framework and is capable of being employed in practical applications thanks to the associating ability with public interests. (2) The proposed LDA approach conserves superiority for natural language processing (NLP) in Twitter data. (3) The perplexity of the proposed skip-gram-based LSTM is lower compared with the conventional LDA approach. (4) Evaluation of the model demonstrates that the hit rates of tweets of the model vary from 50% to 65% and the hit rate of the caching contents is up to approximately 75\% with smaller caching space compared to conventional algorithms.