Biological neurons use spikes to process and learn temporally dynamic inputs in an energy and computationally efficient way. However, applying the state-of-the-art gradient-based supervised algorithms to spiking neural networks (SNN) is a challenge due to the non-differentiability of the activation function of spiking neurons. Employing surrogate gradients is one of the main solutions to overcome this challenge. Although SNNs naturally work in the temporal domain, recent studies have focused on developing SNNs to solve static image categorization tasks. In this paper, we employ a surrogate gradient descent learning algorithm to recognize twelve human hand gestures recorded by dynamic vision sensor (DVS) cameras. The proposed SNN could reach 97.2% recognition accuracy on test data.