Highly accurate time-series vibration prediction is an important research issue for electric vehicles (EVs). EVs often experience vibrations when driving on rough terrains, known as torsional resonance. This resonance, caused by the interaction between motor and tire vibrations, puts excessive loads on the vehicle's drive shaft. However, current damping technologies only detect resonance after the vibration amplitude of the drive shaft torque reaches a certain threshold, leading to significant loads on the shaft at the time of detection. In this study, we propose a novel approach to address this issue by introducing Resoformer, a transformer-based model for predicting torsional resonance. Resoformer utilizes time-series of the motor rotation speed as input and predicts the amplitude of torsional vibration at a specified quantile occurring in the shaft after the input series. By calculating the attention between recursive and convolutional features extracted from the measured data points, Resoformer improves the accuracy of vibration forecasting. To evaluate the model, we use a vibration dataset called VIBES (Dataset for Forecasting Vibration Transition in EVs), consisting of 2,600 simulator-generated vibration sequences. Our experiments, conducted on strong baselines built on the VIBES dataset, demonstrate that Resoformer achieves state-of-the-art results. In conclusion, our study answers the question "Can Transformers Forecast Vibrations?" While traditional transformer architectures show low performance in forecasting torsional resonance waves, our findings indicate that combining recurrent neural network and temporal convolutional network using the transformer architecture improves the accuracy of long-term vibration forecasting.