Abstract:It is important to predict how the Global Mean Temperature (GMT) will evolve in the next few decades. The ability to predict historical data is a necessary first step toward the actual goal of making long-range forecasts. This paper examines the advantage of statistical and simpler Machine Learning (ML) methods instead of directly using complex ML algorithms and Deep Learning Neural Networks (DNN). Often neglected data transformation methods prior to applying different algorithms have been used as a means of improving predictive accuracy. The GMT time series is treated both as a univariate time series and also cast as a regression problem. Some steps of data transformations were found to be effective. Various simple ML methods did as well or better than the more well-known ones showing merit in trying a large bouquet of algorithms as a first step. Fifty-six algorithms were subject to Box-Cox, Yeo-Johnson, and first-order differencing and compared with the absence of them. Predictions for the annual GMT testing data were better than that published so far, with the lowest RMSE value of 0.02 $^\circ$C. RMSE for five-year mean GMT values for the test data ranged from 0.00002 to 0.00036 $^\circ$C.
Abstract:A univariate time series with high variability can pose a challenge even to Deep Neural Network (DNN). To overcome this, a univariate time series is decomposed into simpler constituent series, whose sum equals the original series. As demonstrated in this article, the conventional one-time decomposition technique suffers from a leak of information from the future, referred to as a data leak. In this work, a novel Moving Front (MF) method is proposed to prevent data leakage, so that the decomposed series can be treated like other time series. Indian Summer Monsoon Rainfall (ISMR) is a very complex time series, which poses a challenge to DNN and is therefore selected as an example. From the many signal processing tools available, Empirical Wavelet Transform (EWT) was chosen for decomposing the ISMR into simpler constituent series, as it was found to be more effective than the other popular algorithm, Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN). The proposed MF method was used to generate the constituent leakage-free time series. Predictions and forecasts were made by state-of-the-art Long and Short-Term Memory (LSTM) network architecture, especially suitable for making predictions of sequential patterns. The constituent MF series has been divided into training, testing, and forecasting. It has been found that the model (EWT-MF-LSTM) developed here made exceptionally good train and test predictions, as well as Walk-Forward Validation (WFV), forecasts with Performance Parameter ($PP$) values of 0.99, 0.86, and 0.95, respectively, where $PP$ = 1.0 signifies perfect reproduction of the data.