In social media platforms, emojis have an extremely high occurrence in computer-mediated communications. Many emojis are used to strengthen the emotional expressions and the emojis that co-occurs in a sentence also have a strong sentiment connection. However, when it comes to emoji representation learning, most studies have only utilized the fixed descriptions provided by the Unicode Consortium, without consideration of actual usage scenario. As for the sentiment analysis task, many researchers ignore the emotional impact of the interaction between text and emojis. It results that the emotional semantics of emojis cannot be fully explored. In this work, we propose a method to learn emoji representations called EmoGraph2vec and design an emoji-aware co-attention network that learns the mutual emotional semantics between text and emojis on short texts of social media. In EmoGraph2vec, we form an emoji co-occurrence network on real social data and enrich the semantic information based on an external knowledge base EmojiNet to obtain emoji node embeddings. Our model designs a co-attention mechanism to incorporate the text and emojis, and integrates a squeeze-and-excitation (SE) block into a convolutional neural network as a classifier. Finally, we use the transfer learning method to increase converge speed and achieve higher accuracy. Experimental results show that the proposed model can outperform several baselines for sentiment analysis on benchmark datasets. Additionally, we conduct a series of ablation and comparison experiments to investigate the effectiveness of our model.