Abstract:Eye movement prediction is a promising area of research to compensate for the latency introduced by eye-tracking systems in virtual reality devices. In this study, we comprehensively analyze the complexity of the eye movement prediction task associated with subjects. We use three fundamentally different models within the analysis: the lightweight Long Short-Term Memory network (LSTM), the transformer-based network for multivariate time series representation learning (TST), and the Oculomotor Plant Mathematical Model wrapped in the Kalman Filter framework (OPKF). Each solution is assessed following a sample-to-event evaluation strategy and employing the new event-to-subject metrics. Our results show that the different models maintained similar prediction performance trends pertaining to subjects. We refer to these outcomes as per-subject complexity since some subjects' data pose a more significant challenge for models. Along with the detailed correlation analysis, this report investigates the source of the per-subject complexity and discusses potential solutions to overcome it.
Abstract:Photosensor oculography (PS-OG) eye movement sensors offer desirable performance characteristics for integration within wireless head mounted devices (HMDs), including low power consumption and high sampling rates. To address the known performance degradation of these sensors due to HMD shifts, various machine learning techniques have been proposed for mapping sensor outputs to gaze location. This paper advances the understanding of a recently introduced convolutional neural network designed to provide shift invariant gaze mapping within a specified range of sensor translations. Performance is assessed for shift training examples which better reflect the distribution of values that would be generated through manual repositioning of the HMD during a dedicated collection of training data. The network is shown to exhibit comparable accuracy for this realistic shift distribution versus a previously considered rectangular grid, thereby enhancing the feasibility of in-field set-up. In addition, this work further demonstrates the practical viability of the proposed initialization process by demonstrating robust mapping performance versus training data scale. The ability to maintain reasonable accuracy for shifts extending beyond those introduced during training is also demonstrated.