In this paper, we introduce a new dataset for student engagement detection and localization. Digital revolution has transformed the traditional teaching procedure and a result analysis of the student engagement in an e-learning environment would facilitate effective task accomplishment and learning. Well known social cues of engagement/disengagement can be inferred from facial expressions, body movements and gaze pattern. In this paper, student's response to various stimuli videos are recorded and important cues are extracted to estimate variations in engagement level. In this paper, we study the association of a subject's behavioral cues with his/her engagement level, as annotated by labelers. We then localize engaging/non-engaging parts in the stimuli videos using a deep multiple instance learning based framework, which can give useful insight into designing Massive Open Online Courses (MOOCs) video material. Recognizing the lack of any publicly available dataset in the domain of user engagement, a new `in the wild' dataset is created to study the subject engagement problem. The dataset contains 195 videos captured from 78 subjects which is about 16.5 hours of recording. We present detailed baseline results using different classifiers ranging from traditional machine learning to deep learning based approaches. The subject independent analysis is performed so that it can be generalized to new users. The problem of engagement prediction is modeled as a weakly supervised learning problem. The dataset is manually annotated by different labelers for four levels of engagement independently and the correlation studies between annotated and predicted labels of videos by different classifiers is reported. This dataset creation is an effort to facilitate research in various e-learning environments such as intelligent tutoring systems, MOOCs, and others.