This work introduces a tensor-based method to perform supervised classification on spatiotemporal data processed in an echo state network. Typically when performing supervised classification tasks on data processed in an echo state network, the entire collection of hidden layer node states from the training dataset is shaped into a matrix, allowing one to use standard linear algebra techniques to train the output layer. However, the collection of hidden layer states is multidimensional in nature, and representing it as a matrix may lead to undesirable numerical conditions or loss of spatial and temporal correlations in the data. This work proposes a tensor-based supervised classification method on echo state network data that preserves and exploits the multidimensional nature of the hidden layer states. The method, which is based on orthogonal Tucker decompositions of tensors, is compared with the standard linear output weight approach in several numerical experiments on both synthetic and natural data. The results show that the tensor-based approach tends to outperform the standard approach in terms of classification accuracy.