Abstract:Graph learning on molecules makes use of information from both the molecular structure and the features attached to that structure. Much work has been conducted on biasing either towards structure or features, with the aim that bias bolsters performance. Identifying which information source a dataset favours, and therefore how to approach learning that dataset, is an open issue. Here we propose Noise-Noise Ratio Difference (NNRD), a quantitative metric for whether there is more useful information in structure or features. By employing iterative noising on features and structure independently, leaving the other intact, NNRD measures the degradation of information in each. We employ NNRD over a range of molecular tasks, and show that it corresponds well to a loss of information, with intuitive results that are more expressive than simple performance aggregates. Our future work will focus on expanding data domains, tasks and types, as well as refining our choice of baseline model.
Abstract:Multi-step forecasting (MSF) in time-series, the ability to make predictions multiple time steps into the future, is fundamental to almost all temporal domains. To make such forecasts, one must assume the recursive complexity of the temporal dynamics. Such assumptions are referred to as the forecasting strategy used to train a predictive model. Previous work shows that it is not clear which forecasting strategy is optimal a priori to evaluating on unseen data. Furthermore, current approaches to MSF use a single (fixed) forecasting strategy. In this paper, we characterise the instance-level variance of optimal forecasting strategies and propose Dynamic Strategies (DyStrat) for MSF. We experiment using 10 datasets from different scales, domains, and lengths of multi-step horizons. When using a random-forest-based classifier, DyStrat outperforms the best fixed strategy, which is not knowable a priori, 94% of the time, with an average reduction in mean-squared error of 11%. Our approach typically triples the top-1 accuracy compared to current approaches. Notably, we show DyStrat generalises well for any MSF task.