An electroencephalogram (EEG) signal is currently accepted as a standard for automatic sleep staging. Lately, Near-human accuracy in automated sleep staging has been achievable by Deep Learning (DL) based approaches, enabling multi-fold progress in this area. However, An extensive and expensive clinical setup is required for EEG based sleep staging. Additionally, the EEG setup being obtrusive in nature and requiring an expert for setup adds to the inconvenience of the subject under study, making it adverse in the point of care setting. An unobtrusive and more suitable alternative to EEG is Electrocardiogram (ECG). Unsurprisingly, compared to EEG in sleep staging, its performance remains sub-par. In order to take advantage of both the modalities, transferring knowledge from EEG to ECG is a reasonable approach, ultimately boosting the performance of ECG based sleep staging. Knowledge Distillation (KD) is a promising notion in DL that shares knowledge from a superior performing but usually more complex teacher model to an inferior but compact student model. Building upon this concept, a cross-modality KD framework assisting features learned through models trained on EEG to improve ECG-based sleep staging performance is proposed. Additionally, to better understand the distillation approach, extensive experimentation on the independent modules of the proposed model was conducted. Montreal Archive of Sleep Studies (MASS) dataset consisting of 200 subjects was utilized for this study. The results from the proposed model for weighted-F1-score in 3-class and 4-class sleep staging showed a 13.40 \% and 14.30 \% improvement, respectively. This study demonstrates the feasibility of KD for single-channel ECG based sleep staging's performance enhancement in 3-class (W-R-N) and 4-class (W-R-L-D) classification.