We introduce Language World Models, a class of language-conditional generative model which interpret natural language messages by predicting latent codes of future observations. This provides a visual grounding of the message, similar to an enhanced observation of the world, which may include objects outside of the listening agent's field-of-view. We incorporate this "observation" into a persistent memory state, and allow the listening agent's policy to condition on it, akin to the relationship between memory and controller in a World Model. We show this improves effective communication and task success in 2D gridworld speaker-listener navigation tasks. In addition, we develop two losses framed specifically for our model-based formulation to promote positive signalling and positive listening. Finally, because messages are interpreted in a generative model, we can visualize the model beliefs to gain insight into how the communication channel is utilized.