Drought is one of the most destructive and expensive natural disasters, severely impacting natural resources and risks by depleting water resources and diminishing agricultural yields. Under climate change, accurately predicting drought is critical for mitigating drought-induced risks. However, the intricate interplay among the physical and biological drivers that regulate droughts limits the predictability and understanding of drought, particularly at a subseasonal to seasonal (S2S) time scale. While deep learning has been demonstrated with potential in addressing climate forecasting challenges, its application to drought prediction has received relatively less attention. In this work, we propose a new dataset, DroughtSet, which integrates relevant predictive features and three drought indices from multiple remote sensing and reanalysis datasets across the contiguous United States (CONUS). DroughtSet specifically provides the machine learning community with a new real-world dataset to benchmark drought prediction models and more generally, time-series forecasting methods. Furthermore, we propose a spatial-temporal model SPDrought to predict and interpret S2S droughts. Our model learns from the spatial and temporal information of physical and biological features to predict three types of droughts simultaneously. Multiple strategies are employed to quantify the importance of physical and biological features for drought prediction. Our results provide insights for researchers to better understand the predictability and sensitivity of drought to biological and physical conditions. We aim to contribute to the climate field by proposing a new tool to predict and understand the occurrence of droughts and provide the AI community with a new benchmark to study deep learning applications in climate science.