https://anonymous.4open.science/r/TrialDura-F196
The clinical trial process, also known as drug development, is an indispensable step toward the development of new treatments. The major objective of interventional clinical trials is to assess the safety and effectiveness of drug-based treatment in treating certain diseases in the human body. However, clinical trials are lengthy, labor-intensive, and costly. The duration of a clinical trial is a crucial factor that influences overall expenses. Therefore, effective management of the timeline of a clinical trial is essential for controlling the budget and maximizing the economic viability of the research. To address this issue, We propose TrialDura, a machine learning-based method that estimates the duration of clinical trials using multimodal data, including disease names, drug molecules, trial phases, and eligibility criteria. Then, we encode them into Bio-BERT embeddings specifically tuned for biomedical contexts to provide a deeper and more relevant semantic understanding of clinical trial data. Finally, the model's hierarchical attention mechanism connects all of the embeddings to capture their interactions and predict clinical trial duration. Our proposed model demonstrated superior performance with a mean absolute error (MAE) of 1.04 years and a root mean square error (RMSE) of 1.39 years compared to the other models, indicating more accurate clinical trial duration prediction. Publicly available code can be found at