Machine learned tasks on seismic data are often trained sequentially and separately, even though they utilize the same features (i.e. geometrical) of the data. We present StorSeismic, as a framework for seismic data processing, which consists of neural network pre-training and fine-tuning procedures. We, specifically, utilize a neural network as a preprocessing model to store seismic data features of a particular dataset for any downstream tasks. After pre-training, the resulting model can be utilized later, through a fine-tuning procedure, to perform tasks using limited additional training. Used often in Natural Language Processing (NLP) and lately in vision tasks, BERT (Bidirectional Encoder Representations from Transformer), a form of a Transformer model, provides an optimal platform for this framework. The attention mechanism of BERT, applied here on a sequence of traces within the shot gather, is able to capture and store key geometrical features of the seismic data. We pre-train StorSeismic on field data, along with synthetically generated ones, in the self-supervised step. Then, we use the labeled synthetic data to fine-tune the pre-trained network in a supervised fashion to perform various seismic processing tasks, like denoising, velocity estimation, first arrival picking, and NMO. Finally, the fine-tuned model is used to obtain satisfactory inference results on the field data.