Abstract:Encouraged by the success of deep learning in a variety of domains, we investigate a novel application of its methods on the effectiveness of detecting user confusion in eye-tracking data. We introduce an architecture that uses RNN and CNN sub-models in parallel to take advantage of the temporal and visuospatial aspects of our data. Experiments with a dataset of user interactions with the ValueChart visualization tool show that our model outperforms an existing model based on Random Forests resulting in a 22% improvement in combined sensitivity & specificity.