Efficient discovery of emotion states of speakers in a multi-party conversation is highly important to design human-like conversational agents. During the conversation, the cognitive state of a speaker often alters due to certain past utterances, which may lead to a flip in her emotion state. Therefore, discovering the reasons (triggers) behind one's emotion flip during conversation is important to explain the emotion labels of individual utterances. In this paper, along with addressing the task of emotion recognition in conversations (ERC), we introduce a novel task -- Emotion Flip Reasoning (EFR) that aims to identify past utterances which have triggered one's emotion state to flip at a certain time. We propose a masked memory network to address the former and a Transformer-based network for the latter task. To this end, we consider MELD, a benchmark emotion recognition dataset in multi-party conversations for the task of ERC and augment it with new ground-truth labels for EFR. An extensive comparison with four state-of-the-art models suggests improved performances of our models for both the tasks. We further present anecdotal evidences and both qualitative and quantitative error analyses to support the superiority of our models compared to the baselines.