Our aim is to estimate the perspective-effected geometric distortion of a scene from a video feed. In contrast to all previous work we wish to achieve this using from low-level, spatio-temporally local motion features used in commercial semi-automatic surveillance systems. We: (i) describe a dense algorithm which uses motion features to estimate the perspective distortion at each image locus and then polls all such local estimates to arrive at the globally best estimate, (ii) present an alternative coarse algorithm which subdivides the image frame into blocks, and uses motion features to derive block-specific motion characteristics and constrain the relationships between these characteristics, with the perspective estimate emerging as a result of a global optimization scheme, and (iii) report the results of an evaluation using nine large sets acquired using existing close-circuit television (CCTV) cameras. Our findings demonstrate that both of the proposed methods are successful, their accuracy matching that of human labelling using complete visual data.