Conventional automatic incident detection (AID) has relied heavily on all incident reports exclusively for training and evaluation. However, these reports suffer from a number of issues, such as delayed reports, inaccurate descriptions, false alarms, missing reports, and incidents that do not necessarily influence traffic. Relying on these reports to train or calibrate AID models hinders their ability to detect traffic anomalies effectively and timely, even leading to convergence issues in the model training process. Moreover, conventional AID models are not inherently designed to capture the early indicators of any generic incidents. It remains unclear how far ahead an AID model can report incidents. The AID applications in the literature are also spatially limited because the data used by most models is often limited to specific test road segments. To solve these problems, we propose a deep learning framework utilizing prior domain knowledge and model-designing strategies. This allows the model to detect a broader range of anomalies, not only incidents that significantly influence traffic flow but also early characteristics of incidents along with historically unreported anomalies. We specially design the model to target the early-stage detection/prediction of an incident. Additionally, unlike most conventional AID studies, we use widely available data, enhancing our method's scalability. The experimental results across numerous road segments on different maps demonstrate that our model leads to more effective and early anomaly detection. Our framework does not focus on stacking or tweaking various deep learning models; instead, it focuses on model design and training strategies to improve early detection performance.