From navigation systems to smart assistants, we communicate with various AI on a daily basis. At the core of such human-AI communication, we convey our understanding of the AI's capability to the AI through utterances with different complexities, and the AI conveys its understanding of our needs and goals to us through system outputs. However, this communication process is prone to failures for two reasons: the AI might have the wrong understanding of the user and the user might have the wrong understanding of the AI. To enhance mutual understanding in human-AI communication, we posit the Mutual Theory of Mind (MToM) framework, inspired by our basic human capability of "Theory of Mind." In this paper, we discuss the motivation of the MToM framework and its three key components that continuously shape the mutual understanding during three stages of human-AI communication. We then describe a case study inspired by the MToM framework to demonstrate the power of MToM framework to guide the design and understanding of human-AI communication.