Vision-based methods are commonly used in robotic arm activity recognition. These approaches typically rely on line-of-sight (LoS) and raise privacy concerns, particularly in smart home applications. Passive Wi-Fi sensing represents a new paradigm for recognizing human and robotic arm activities, utilizing channel state information (CSI) measurements to identify activities in indoor environments. In this paper, a novel machine learning approach based on discrete wavelet transform and vision transformers for robotic arm activity recognition from CSI measurements in indoor settings is proposed. This method outperforms convolutional neural network (CNN) and long short-term memory (LSTM) models in robotic arm activity recognition, particularly when LoS is obstructed by barriers, without relying on external or internal sensors or visual aids. Experiments are conducted using four different data collection scenarios and four different robotic arm activities. Performance results demonstrate that wavelet transform can significantly enhance the accuracy of visual transformer networks in robotic arms activity recognition.