Abstract:This work presents a novel approach to achieving temporally consistent mitral annulus landmark localization in echocardiography videos using sparse annotations. Our method introduces a self-supervised loss term that enforces temporal consistency between neighboring frames, which smooths the position of landmarks and enhances measurement accuracy over time. Additionally, we incorporate realistic field-of-view augmentations to improve the recognition of missing anatomical landmarks. We evaluate our approach on both a public and private dataset, and demonstrate significant improvements in Mitral Annular Plane Systolic Excursion (MAPSE) calculations and overall landmark tracking stability. The method achieves a mean absolute MAPSE error of 1.81 $\pm$ 0.14 mm, an annulus size error of 2.46 $\pm$ 0.31 mm, and a landmark localization error of 2.48 $\pm$ 0.07 mm. Finally, it achieves a 0.99 ROC-AUC for recognition of missing landmarks.