University of California, Santa Cruz
Abstract:This thesis presents a new algorithm to mitigate cloud masking in the analysis of sea surface temperature (SST) data generated by remote sensing technologies, e.g., Clouds interfere with the analysis of all remote sensing data using wavelengths shorter than 12 microns, significantly limiting the quantity of usable data and creating a biased geographical distribution (towards equatorial and coastal regions). To address this issue, we propose an unsupervised machine learning algorithm called Enki which uses a Vision Transformer with Masked Autoencoding to reconstruct masked pixels. We train four different models of Enki with varying mask ratios (t) of 10%, 35%, 50%, and 75% on the generated Ocean General Circulation Model (OGCM) dataset referred to as LLC4320. To evaluate performance, we reconstruct a validation set of LLC4320 SST images with random ``clouds'' corrupting p=10%, 20%, 30%, 40%, 50% of the images with individual patches of 4x4 pixel^2. We consistently find that at all levels of p there is one or multiple models that reconstruct the images with a mean RMSE of less than 0.03K, i.e. lower than the estimated sensor error of VIIRS data. Similarly, at the individual patch level, the reconstructions have RMSE 8x smaller than the fluctuations in the patch. And, as anticipated, reconstruction errors are larger for images with a higher degree of complexity. Our analysis also reveals that patches along the image border have systematically higher reconstruction error; we recommend ignoring these in production. We conclude that Enki shows great promise to surpass in-painting as a means of reconstructing cloud masking. Future research will develop Enki to reconstruct real-world data.