Abstract:In order to accurately reconstruct the time history of the atmospheric state, ensemble-based data assimilation algorithms need to be initialized appropriately. At present, there is no standard approach to initializing large-eddy simulation codes for microscale data assimilation. Here, given synthetic observations, we generate ensembles of plausible initial conditions using a latent diffusion model. We modify the original, two-dimensional latent diffusion model code to work on three-dimensional turbulent fields. The algorithm produces realistic and diverse samples that successfully run when inserted into a large-eddy simulation code. The samples have physically plausible turbulent structures on large and moderate spatial scales in the context of our simulations. The generated ensembles show a lower spread in the vicinity of observations while having higher variability further from the observations, matching expected behavior. Ensembles demonstrate near-zero bias relative to ground truth in the vicinity of observations, but rank histogram analysis suggests that ensembles have too little member-to-member variability when compared to an ideal ensemble. Given the success of the latent diffusion model, the generated ensembles will be tested in their ability to recreate a time history of the atmosphere when coupled to an ensemble-based data assimilation algorithm in upcoming work. We find that diffusion models show promise and potential for other applications within the geosciences.