Bayesian networks have been used extensively in diagnostic tasks such as medicine, where they represent the dependency relations between a set of symptoms and a set of diseases. A criticism of this type of knowledge representation is that it is restricted to this kind of task, and that it cannot cope with the knowledge required in other artificial intelligence applications. For example, in computer vision, we require the ability to model complex knowledge, including temporal and relational factors. In this paper we extend Bayesian networks to model relational and temporal knowledge for high-level vision. These extended networks have a simple structure which permits us to propagate probability efficiently. We have applied them to the domain of endoscopy, illustrating how the general modelling principles can be used in specific cases.