The use of the Audio Spectrogram Transformer (AST) model for gravitational-wave data analysis is investigated. The AST machine-learning model is a convolution-free classifier that captures long-range global dependencies through a purely attention-based mechanism. In this paper a model is applied to a simulated dataset of inspiral gravitational wave signals from binary neutron star coalescences, built from five distinct, cold equations of state (EOS) of nuclear matter. From the analysis of the mass dependence of the tidal deformability parameter for each EOS class it is shown that the AST model achieves a promising performance in correctly classifying the EOS purely from the gravitational wave signals, especially when the component masses of the binary system are in the range $[1,1.5]M_{\odot}$. Furthermore, the generalization ability of the model is investigated by using gravitational-wave signals from a new EOS not used during the training of the model, achieving fairly satisfactory results. Overall, the results, obtained using the simplified setup of noise-free waveforms, show that the AST model, once trained, might allow for the instantaneous inference of the cold nuclear matter EOS directly from the inspiral gravitational-wave signals produced in binary neutron star coalescences.