Abstract:Can we predict if an early stage cancer patient is at high risk of developing distant metastasis and what clinicopathological factors are associated with such a risk? In this paper, we propose a ranking based censoring-aware machine learning model for answering such questions. The proposed model is able to generate an interpretable formula for risk stratifi-cation using a minimal number of clinicopathological covariates through L1-regulrization. Using this approach, we analyze the association of time to distant metastasis (TTDM) with various clinical parameters for early stage, luminal (ER+ or HER2-) breast cancer patients who received endocrine therapy but no chemotherapy (n = 728). The TTDM risk stratification formula obtained using the proposed approach is primarily based on mitotic score, histolog-ical tumor type and lymphovascular invasion. These findings corroborate with the known role of these covariates in increased risk for distant metastasis. Our analysis shows that the proposed risk stratification formula can discriminate between cases with high and low risk of distant metastasis (p-value < 0.005) and can also rank cases based on their time to distant metastasis with a concordance-index of 0.73.