Abstract:Machine learning models have shown great success in predicting weather up to two weeks ahead, outperforming process-based benchmarks. However, existing approaches mostly focus on the prediction task, and do not incorporate the necessary data assimilation. Moreover, these models suffer from error accumulation in long roll-outs, limiting their applicability to seasonal predictions or climate projections. Here, we introduce Generative Assimilation and Prediction (GAP), a unified deep generative framework for assimilation and prediction of both weather and climate. By learning to quantify the probabilistic distribution of atmospheric states under observational, predictive, and external forcing constraints, GAP excels in a broad range of weather-climate related tasks, including data assimilation, seamless prediction, and climate simulation. In particular, GAP is competitive with state-of-the-art ensemble assimilation, probabilistic weather forecast and seasonal prediction, yields stable millennial simulations, and reproduces climate variability from daily to decadal time scales.
Abstract:Text provides a compelling example of unstructured data that can be used to motivate and explore classification problems. Challenges arise regarding the representation of features of text and student linkage between text representations as character strings and identification of features that embed connections with underlying phenomena. In order to observe how students reason with text data in scenarios designed to elicit certain aspects of the domain, we employed a task-based interview method using a structured protocol with six pairs of undergraduate students. Our goal was to shed light on students' understanding of text as data using a motivating task to classify headlines as "clickbait" or "news". Three types of features (function, content, and form) surfaced, the majority from the first scenario. Our analysis of the interviews indicates that this sequence of activities engaged the participants in thinking at both the human-perception level and the computer-extraction level and conceptualizing connections between them.