Multimedia data is highly expressive and has traditionally been very difficult for a machine to interpret. Middleware systems such as complex event processing (CEP) mine patterns from data streams and send notifications to users in a timely fashion. Presently, CEP systems have inherent limitations to process multimedia streams due to its data complexity and the lack of an underlying structured data model. In this work, we present a visual event specification method to enable complex multimedia event processing by creating a semantic knowledge representation derived from low-level media streams. The method enables the detection of high-level semantic concepts from the media streams using an ensemble of pattern detection capabilities. The semantic model is aligned with a multimedia CEP engine deep learning models to give flexibility to end-users to build rules using spatiotemporal event calculus. This enhances CEP capability to detect patterns from media streams and bridge the semantic gap between highly expressive knowledge-centric user queries to the low-level features of the multi-media data. We have built a small traffic event ontology prototype to validate the approach and performance. The paper contribution is threefold: i) we present a knowledge graph representation for multimedia streams, ii) a hierarchical event network to detect visual patterns from media streams and iii) define complex pattern rules for complex multimedia event reasoning using event calculus