Abstract:We develop Few-Shot Learning models trained to recognize five or ten different dynamic hand gestures, respectively, which are arbitrarily interchangeable by providing the model with one, two, or five examples per hand gesture. All models were built in the Few-Shot Learning architecture of the Relation Network (RN), in which Long-Short-Term Memory cells form the backbone. The models use hand reference points extracted from RGB-video sequences of the Jester dataset which was modified to contain 190 different types of hand gestures. Result show accuracy of up to 88.8% for recognition of five and up to 81.2% for ten dynamic hand gestures. The research also sheds light on the potential effort savings of using a Few-Shot Learning approach instead of a traditional Deep Learning approach to detect dynamic hand gestures. Savings were defined as the number of additional observations required when a Deep Learning model is trained on new hand gestures instead of a Few Shot Learning model. The difference with respect to the total number of observations required to achieve approximately the same accuracy indicates potential savings of up to 630 observations for five and up to 1260 observations for ten hand gestures to be recognized. Since labeling video recordings of hand gestures implies significant effort, these savings can be considered substantial.