Abstract:Missing/erroneous data is a major problem in today's world. Collected seismic data sometimes contain gaps due to multitude of reasons like interference and sensor malfunction. Gaps in seismic waveforms hamper further signal processing to gain valuable information. Plethora of techniques are used for data reconstruction in other domains like image, video, audio, but translation of those methods to address seismic waveforms demands adapting them to lengthy sequence inputs, which is practically complex. Even if that is accomplished, high computational costs and inefficiency would still persist in these predominantly convolution-based reconstruction models. In this paper, we present a transformer-based deep learning model, Xi-Net, which utilizes multi-faceted time and frequency domain inputs for accurate waveform reconstruction. Xi-Net converts the input waveform to frequency domain, employs separate encoders for time and frequency domains, and one decoder for getting reconstructed output waveform from the fused features. 1D shifted-window transformer blocks form the elementary units of all parts of the model. To the best of our knowledge, this is the first transformer-based deep learning model for seismic waveform reconstruction. We demonstrate this model's prowess by filling 0.5-1s random gaps in 120s waveforms, resembling the original waveform quite closely. The code, models can be found at: https://github.com/Anshuman04/waveformReconstructor.
Abstract:Similarity search is a popular technique for seismic signal processing, with template matching, matched filters and subspace detectors being utilized for a wide variety of tasks, including both signal detection and source discrimination. Traditionally, these techniques rely on the cross-correlation function as the basis for measuring similarity. Unfortunately, seismogram correlation is dominated by path effects, essentially requiring a distinct waveform template along each path to be detected. To address this limitation, we define a path-invariant measure for seismogram similarity. A deep convolutional neural network with a triplet loss function maps raw seismograms to a low dimensional embedding space, where nearness on the space corresponds to nearness of source function, regardless of path or recording instrumentation. This path-agnostic embedding space represents a new representation for seismograms, characterized by robust, source-specific features. The dataset used to train and test the algorithm comes primarily from the USArray experiment, a temporary network of 400 seismometers that was deployed at more than 2000 locations across the US from 2007 to 2012. The first four years (2006, 2007, 2008, 2009) were selected as the training set, and the following two years (2010, 2011) were selected for validation and testing. The training, validation and test sets contained 24,811, 5,711 and 4,214 seismograms, respectively. The utility of our novel embedding space representation is evaluated across three common seismic tasks: event association, signal detection, and source discrimination, achieving an accuracy of 80%, 92% and 90%, respectively, all while minimizing the number of template waveforms required.