Abstract:Predicting crash likelihood in complex driving environments is essential for improving traffic safety and advancing autonomous driving. Previous studies have used statistical models and deep learning to predict crashes based on semantic, contextual, or driving features, but none have examined the combined influence of these factors, termed roadway complexity in this study. This paper introduces a two-stage framework that integrates roadway complexity features for crash prediction. In the first stage, an encoder extracts hidden contextual information from these features, generating complexity-infused features. The second stage uses both original and complexity-infused features to predict crash likelihood, achieving an accuracy of 87.98% with original features alone and 90.15% with the added complexity-infused features. Ablation studies confirm that a combination of semantic, driving, and contextual features yields the best results, which emphasize their role in capturing roadway complexity. Additionally, complexity index annotations generated by Large Language Models outperform those by Amazon Mechanical Turk, highlighting the potential of automated tools for accurate, scalable crash prediction systems.