Abstract:Three main points: 1. Data Science (DS) will be increasingly important to heliophysics; 2. Methods of heliophysics science discovery will continually evolve, requiring the use of learning technologies [e.g., machine learning (ML)] that are applied rigorously and that are capable of supporting discovery; and 3. To grow with the pace of data, technology, and workforce changes, heliophysics requires a new approach to the representation of knowledge.
Abstract:Equatorial Plasma Bubbles (EPBs) are plumes of low density plasma that rise up from the bottomside of the F layer towards the exosphere. EPBs are known causes of radio wave scintillations which can degrade communications with spacecraft. We build a random forest regressor to predict and forecast the probability of an EPB [0-1] detected by the IBI processor on-board the SWARM spacecraft. We use 8-years of Swarm data from 2014 to 2021 and transform the data from a time series into a 5 dimensional space consisting of latitude, longitude, mlt, year, and day-of-the-year. We also add Kp, F10.7cm and solar wind speed. The observations of EPBs with respect to geolocation, local time, season and solar activity mostly agrees with existing work, whilst the link geomagnetic activity is less clear. The prediction has an accuracy of 88% and performs well across the EPB specific spatiotemporal scales. This proves that the XGBoost method is able to successfully capture the climatological and daily variability of SWARM EPBs. Capturing the daily variance has long evaded researchers because of local and stochastic features within the ionosphere. We take advantage of Shapley Values to explain the model and to gain insight into the physics of EPBs. We find that as the solar wind speed increases the probability of an EPB decreases. We also identify a spike in EPB probability around the Earth-Sun perihelion. Both of these insights were derived directly from the XGBoost and Shapley technique.