This paper introduces a framework of gesture recognition operating on the output of an event based camera using the computational resources of a mobile phone. We will introduce a new development around the concept of time-surfaces modified and adapted to run on the limited computational resources of a mobile platform. We also introduce a new method to remove dynamically backgrounds that makes full use of the high temporal resolution of event-based cameras. We assess the performances of the framework by operating on several dynamic scenarios in uncontrolled lighting conditions indoors and outdoors. We also introduce a new publicly available event-based dataset for gesture recognition selected through a clinical process to allow human-machine interactions for the visually-impaired and the elderly. We finally report comparisons with prior works that tackled event-based gesture recognition reporting comparable if not superior results if taking into account the limited computational and memory constraints of the used hardware.