Abstract:Recent studies on appearance based gaze estimation indicate the ability of Neural Networks to decode gaze information from facial images encompassing pose information. In this paper, we propose Gaze-Net: A capsule network capable of decoding, representing, and estimating gaze information from ocular region images. We evaluate our proposed system using two publicly available datasets, MPIIGaze (200,000+ images in the wild) and Columbia Gaze (5000+ images of users with 21 gaze directions observed at 5 camera angles/positions). Our model achieves a Mean Absolute Error (MAE) of 2.84$^\circ$ for Combined angle error estimate within dataset for MPI-IGaze dataset. Further, model achieves a MAE of 10.04$^\circ$ for across dataset gaze estimation error for Columbia gaze dataset. Through transfer learning, the error is reduced to 5.9$^\circ$. The results show this approach is promising with implications towards using commodity webcams to develop low-cost multi-user gaze tracking systems.