In order to create effective storytelling agents three fundamental questions must be answered: first, is a physically embodied agent preferable to a virtual agent or a voice-only narration? Second, does a human voice have an advantage over a synthesised voice? Third, how should the emotional trajectory of the different characters in a story be related to a storyteller's facial expressions during storytelling time, and how does this correlate with the apparent emotions on the faces of the listeners? The results of two specially designed studies indicate that the physically embodied robot produces more attention to the listener as compared to a virtual embodiment, that a human voice is preferable over the current state of the art of text-to-speech, and that there is a complex yet interesting relation between the emotion lines of the story, the facial expressions of the narrating agent, and the emotions of the listener, and that the empathising of the listener is evident through its facial expressions. This work constitutes an important step towards emotional storytelling robots that can observe their listeners and adapt their style in order to maximise their effectiveness.