Abstract:We have been developing a system for recognising human activity given a symbolic representation of video content. The input of our system is a set of time-stamped short-term activities detected on video frames. The output of our system is a set of recognised long-term activities, which are pre-defined temporal combinations of short-term activities. The constraints on the short-term activities that, if satisfied, lead to the recognition of a long-term activity, are expressed using a dialect of the Event Calculus. We illustrate the expressiveness of the dialect by showing the representation of several typical complex activities. Furthermore, we present a detailed evaluation of the system through experimentation on a benchmark dataset of surveillance videos.