Abstract:Accurate prediction of water temperature in streams is critical for monitoring and understanding biogeochemical and ecological processes in streams. Stream temperature is affected by weather patterns (such as solar radiation) and water flowing through the stream network. Additionally, stream temperature can be substantially affected by water releases from man-made reservoirs to downstream segments. In this paper, we propose a heterogeneous recurrent graph model to represent these interacting processes that underlie stream-reservoir networks and improve the prediction of water temperature in all river segments within a network. Because reservoir release data may be unavailable for certain reservoirs, we further develop a data assimilation mechanism to adjust the deep learning model states to correct for the prediction bias caused by reservoir releases. A well-trained temporal modeling component is needed in order to use adjusted states to improve future predictions. Hence, we also introduce a simulation-based pre-training strategy to enhance the model training. Our evaluation for the Delaware River Basin has demonstrated the superiority of our proposed method over multiple existing methods. We have extensively studied the effect of the data assimilation mechanism under different scenarios. Moreover, we show that the proposed method using the pre-training strategy can still produce good predictions even with limited training data.
Abstract:Effective training of advanced ML models requires large amounts of labeled data, which is often scarce in scientific problems given the substantial human labor and material cost to collect labeled data. This poses a challenge on determining when and where we should deploy measuring instruments (e.g., in-situ sensors) to collect labeled data efficiently. This problem differs from traditional pool-based active learning settings in that the labeling decisions have to be made immediately after we observe the input data that come in a time series. In this paper, we develop a real-time active learning method that uses the spatial and temporal contextual information to select representative query samples in a reinforcement learning framework. To reduce the need for large training data, we further propose to transfer the policy learned from simulation data which is generated by existing physics-based models. We demonstrate the effectiveness of the proposed method by predicting streamflow and water temperature in the Delaware River Basin given a limited budget for collecting labeled data. We further study the spatial and temporal distribution of selected samples to verify the ability of this method in selecting informative samples over space and time.
Abstract:This paper proposes a physics-guided machine learning approach that combines advanced machine learning models and physics-based models to improve the prediction of water flow and temperature in river networks. We first build a recurrent graph network model to capture the interactions among multiple segments in the river network. Then we present a pre-training technique which transfers knowledge from physics-based models to initialize the machine learning model and learn the physics of streamflow and thermodynamics. We also propose a new loss function that balances the performance over different river segments. We demonstrate the effectiveness of the proposed method in predicting temperature and streamflow in a subset of the Delaware River Basin. In particular, we show that the proposed method brings a 33\%/14\% improvement over the state-of-the-art physics-based model and 24\%/14\% over traditional machine learning models (e.g., Long-Short Term Memory Neural Network) in temperature/streamflow prediction using very sparse (0.1\%) observation data for training. The proposed method has also been shown to produce better performance when generalized to different seasons or river segments with different streamflow ranges.