Using hand gestures to answer a call or to control the radio while driving a car, is nowadays an established feature in more expensive cars. High resolution time-of-flight cameras and powerful embedded processors usually form the heart of these gesture recognition systems. This however comes with a price tag. We therefore investigate the possibility to design an algorithm that predicts hand gestures using a cheap low-resolution thermal camera with only 32x24 pixels, which is light-weight enough to run on a low-cost processor. We recorded a new dataset of over 1300 video clips for training and evaluation and propose a light-weight low-latency prediction algorithm. Our best model achieves 95.9% classification accuracy and 83% mAP detection accuracy while its processing pipeline has a latency of only one frame.