Biologically inspired algorithms for simultaneous localization and mapping (SLAM) such as RatSLAM have been shown to yield effective and robust robot navigation in both indoor and outdoor environments. One drawback however is the sensitivity to perceptual aliasing due to the template matching of low-dimensional sensory templates. In this paper, we propose an unsupervised representation learning method that yields low-dimensional latent state descriptors that can be used for RatSLAM. Our method is sensor agnostic and can be applied to any sensor modality, as we illustrate for camera images, radar range-doppler maps and lidar scans. We also show how combining multiple sensors can increase the robustness, by reducing the number of false matches. We evaluate on a dataset captured with a mobile robot navigating in a warehouse-like environment, moving through different aisles with similar appearance, making it hard for the SLAM algorithms to disambiguate locations.