Emojis are widely used in online social networks to express emotions, attitudes, and opinions. As emotional-oriented characters, emojis can be modeled as important features of emotions towards the recipient or subject for sentiment analysis. However, existing methods mainly take emojis as heuristic information that fails to resolve the problem of ambiguity noise. Recent researches have utilized emojis as an independent input to classify text sentiment but they ignore the emotional impact of the interaction between text and emojis. It results that the emotional semantics of emojis cannot be fully explored. In this paper, we propose an emoji-based co-attention network that learns the mutual emotional semantics between text and emojis on microblogs. Our model adopts the co-attention mechanism based on bidirectional long short-term memory incorporating the text and emojis, and integrates a squeeze-and-excitation block in a convolutional neural network classifier to increase its sensitivity to emotional semantic features. Experimental results show that the proposed method can significantly outperform several baselines for sentiment analysis on short texts of social media.