Neuroscientists postulate 3D representations in the brain in a variety of different coordinate frames (e.g. 'head-centred', 'hand-centred' and 'world-based'). Recent advances in reinforcement learning demonstrate a quite different approach that may provide a more promising model for biological representations underlying spatial perception and navigation. In this paper, we focus on reinforcement learning methods that reward an agent for arriving at a target image without any attempt to build up a 3D 'map'. We test the ability of this type of representation to support geometrically consistent spatial tasks, such as interpolating between learned locations, and compare its performance to that of a hand-crafted representation which has, by design, a high degree of geometric consistency. Our comparison of these two models demonstrates that it is advantageous to include information about the persistence of features as the camera translates (e.g. distant features persist). It is likely that non-Cartesian representations of this sort will be increasingly important in the search for robust models of human spatial perception and navigation.