Autism Spectrum Disorder (ASD) can often make life difficult for children, therefore early diagnosis is necessary for proper treatment and care. Thus, in this work, we consider the problem of detecting or classifying ASD in children to aid medical professionals in early detection. To this end, we develop a deep learning model that analyzes video clips of children reacting to sensory stimuli, with the intent on capturing key differences in reactions and behavior between ASD and non-ASD patients. Unlike many works in ASD classification, their data consist of MRI data, which requires expensive specialized MRI equipment, meanwhile our method need only rely on a powerful but relatively cheaper GPU, a decent computer setup, and a video camera for inference. Results on our data show that our model can generalize well and can understand key differences in the distinct movements of the patients. This is despite limited amounts of data for a deep learning problem, limited temporal information available to the model as input, and even when there is noise due to movement.