Abstract:Accurately representing surface weather at the sub-kilometer scale is crucial for optimal decision-making in a wide range of applications. This motivates the use of statistical techniques to provide accurate and calibrated probabilistic predictions at a lower cost compared to numerical simulations. Wind represents a particularly challenging variable to model due to its high spatial and temporal variability. This paper presents a novel approach that integrates Gaussian processes (GPs) and neural networks to model surface wind gusts, leveraging multiple data sources, including numerical weather prediction (NWP) models, digital elevation models (DEM), and in-situ measurements. Results demonstrate the added value of modeling the multivariate covariance structure of the variable of interest, as opposed to only applying a univariate probabilistic regression approach. Modeling the covariance enables the optimal integration of observed measurements from ground stations, which is shown to reduce the continuous ranked probability score compared to the baseline. Moreover, it allows the direct generation of realistic fields that are also marginally calibrated, aided by scalable techniques such as Random Fourier Features (RFF) and pathwise conditioning. We discuss the effect of different modeling choices, as well as different degrees of approximation, and present our results for a case study.
Abstract:Diffusion models have been widely adopted in image generation, producing higher-quality and more diverse samples than generative adversarial networks (GANs). We introduce a latent diffusion model (LDM) for precipitation nowcasting - short-term forecasting based on the latest observational data. The LDM is more stable and requires less computation to train than GANs, albeit with more computationally expensive generation. We benchmark it against the GAN-based Deep Generative Models of Rainfall (DGMR) and a statistical model, PySTEPS. The LDM produces more accurate precipitation predictions, while the comparisons are more mixed when predicting whether the precipitation exceeds predefined thresholds. The clearest advantage of the LDM is that it generates more diverse predictions than DGMR or PySTEPS. Rank distribution tests indicate that the distribution of samples from the LDM accurately reflects the uncertainty of the predictions. Thus, LDMs are promising for any applications where uncertainty quantification is important, such as weather and climate.
Abstract:Weather forecasting centers currently rely on statistical postprocessing methods to minimize forecast error. This improves skill but can lead to predictions that violate physical principles or disregard dependencies between variables, which can be problematic for downstream applications and for the trustworthiness of postprocessing models, especially when they are based on new machine learning approaches. Building on recent advances in physics-informed machine learning, we propose to achieve physical consistency in deep learning-based postprocessing models by integrating meteorological expertise in the form of analytic equations. Applied to the post-processing of surface weather in Switzerland, we find that constraining a neural network to enforce thermodynamic state equations yields physically-consistent predictions of temperature and humidity without compromising performance. Our approach is especially advantageous when data is scarce, and our findings suggest that incorporating domain expertise into postprocessing models allows to optimize weather forecast information while satisfying application-specific requirements.
Abstract:Generative adversarial networks (GANs) have been recently adopted for super-resolution, an application closely related to what is referred to as "downscaling" in the atmospheric sciences. The ability of conditional GANs to generate an ensemble of solutions for a given input lends itself naturally to stochastic downscaling, but the stochastic nature of GANs is not usually considered in super-resolution applications. Here, we introduce a recurrent, stochastic super-resolution GAN that can generate ensembles of time-evolving high-resolution atmospheric fields for an input consisting of a low-resolution sequence of images of the same field. We test the GAN using two datasets, one consisting of radar-measured precipitation from Switzerland, the other of cloud optical thickness derived from the Geostationary Earth Observing Satellite 16 (GOES-16). We find that the GAN can generate realistic, temporally consistent solutions for both datasets. The statistical properties of the generated ensemble are analyzed using rank statistics, a method adapted from ensemble weather forecasting; these analyses indicate that the GAN produces close to the correct amount of variability in its outputs. As the GAN generator is fully convolutional, it can be applied after training to input images larger than the images used to train it. It is also able to generate time series much longer than the training sequences, as demonstrated by applying the generator to a three-month dataset of the precipitation radar data.