We present a multi-stage pipeline for simple gesture recognition. The novelty of our approach is the association of different technologies, resulting in the first real-time system as of now to conjointly extract skeletons and recognise gesture on a Pepper robot. For this task, Pepper has been augmented with an embedded GPU for running deep CNNs and a fish-eye camera to capture whole scene interaction. We show in this article that real-case scenarios are challenging, and the state-of-the-art approaches hardly deal with unknown human gestures. We present here a way to handle such cases.