Inferring the location of a mobile device in an indoor setting is an open problem of utmost significance. A leading approach that does not require the deployment of expensive infrastructure is fingerprinting, where a classifier is trained to predict the location of a device based on its captured signal. The main caveat of this approach is that acquiring a sufficiently large and accurate training set may be prohibitively expensive. Here, we propose a weakly supervised method that only requires the location of a small number of devices. The localization is done by matching a low-dimensional spectral representation of the signals to a given sketch of the indoor environment. We test our approach on simulated and real data and show that it yields an accuracy of a few meters, which is on par with fully supervised approaches. The simplicity of our method and its accuracy with minimal supervision makes it ideal for implementation in indoor localization systems.