Within the domain of short video recommendation, predicting users' watch time is a critical but challenging task. Prevailing deterministic solutions obtain accurate debiased statistical models, yet they neglect the intrinsic uncertainty inherent in user environments. In our observation, we found that this uncertainty could potentially limit these methods' accuracy in watch-time prediction on our online platform, despite that we have employed numerous features and complex network architectures. Consequently, we believe that a better solution is to model the conditional distribution of this uncertain watch time. In this paper, we introduce a novel estimation technique -- Conditional Quantile Estimation (CQE), which utilizes quantile regression to capture the nuanced distribution of watch time. The learned distribution accounts for the stochastic nature of users, thereby it provides a more accurate and robust estimation. In addition, we also design several strategies to enhance the quantile prediction including conditional expectation, conservative estimation, and dynamic quantile combination. We verify the effectiveness of our method through extensive offline evaluations using public datasets as well as deployment in a real-world video application with over 300 million daily active users.