Abstract:To enhance therapeutic outcomes from a pharmacological perspective, we propose MiranDa, designed for medication recommendation, which is the first actionable model capable of providing the estimated length of stay in hospitals (ELOS) as counterfactual outcomes that guide clinical practice and model training. In detail, MiranDa emulates the educational trajectory of doctors through two gradient-scaling phases shifted by ELOS: an Evidence-based Training Phase that utilizes supervised learning and a Therapeutic Optimization Phase grounds in reinforcement learning within the gradient space, explores optimal medications by perturbations from ELOS. Evaluation of the Medical Information Mart for Intensive Care III dataset and IV dataset, showcased the superior results of our model across five metrics, particularly in reducing the ELOS. Surprisingly, our model provides structural attributes of medication combinations proved in hyperbolic space and advocated "procedure-specific" medication combinations. These findings posit that MiranDa enhanced medication efficacy. Notably, our paradigm can be applied to nearly all medical tasks and those with information to evaluate predicted outcomes. The source code of the MiranDa model is available at https://github.com/azusakou/MiranDa.