Supervised learning techniques are at the center of many tasks in remote sensing. Unfortunately, these methods, especially recent deep learning methods, often require large amounts of labeled data for training. Even though satellites acquire large amounts of data, labeling the data is often tedious, expensive and requires expert knowledge. Hence, improved methods that require fewer labeled samples are needed. We present MSMatch, the first semi-supervised learning approach competitive with supervised methods on scene classification on the EuroSAT benchmark dataset. We test both RGB and multispectral images and perform various ablation studies to identify the critical parts of the model. The trained neural network achieves state-of-the-art results on EuroSAT with an accuracy that is between 1.98% and 19.76% better than previous methods depending on the number of labeled training examples. With just five labeled examples per class we reach 94.53% and 95.86% accuracy on the EuroSAT RGB and multispectral datasets, respectively. With 50 labels per class we reach 97.62% and 98.23% accuracy. Our results show that MSMatch is capable of greatly reducing the requirements for labeled data. It translates well to multispectral data and should enable various applications that are currently infeasible due to a lack of labeled data. We provide the source code of MSMatch online to enable easy reproduction and quick adoption.