Abstract:Micropaleontology in geosciences focuses on studying the evolution of microfossils (e.g., foraminifera) through geological records to reconstruct past environmental and climatic conditions. This field heavily relies on visual recognition of microfossil features, making it suitable for computer vision technology, specifically deep convolutional neural networks (CNNs), to automate and optimize microfossil identification and classification. However, the application of deep learning in micropaleontology is hindered by limited availability of high-quality, high-resolution labeled fossil images and the significant manual labeling effort required by experts. To address these challenges, we propose a novel deep learning workflow combining hierarchical vision transformers with style-based generative adversarial network algorithms to efficiently acquire and synthetically generate realistic high-resolution labeled datasets of micropaleontology in large volumes. Our study shows that this workflow can generate high-resolution images with a high signal-to-noise ratio (39.1 dB) and realistic synthetic images with a Frechet inception distance similarity score of 14.88. Additionally, our workflow provides a large volume of self-labeled datasets for model benchmarking and various downstream visual tasks, including fossil classification and segmentation. For the first time, we performed few-shot semantic segmentation of different foraminifera chambers on both generated and synthetic images with high accuracy. This novel meta-learning approach is only possible with the availability of high-resolution, high-volume labeled datasets. Our deep learning-based workflow shows promise in advancing and optimizing micropaleontological research and other visual-dependent geological analyses.