Parking spaces are costly to build, parking payments are difficult to enforce, and drivers waste an excessive amount of time searching for empty lots. Accurate quantification would inform developers and municipalities in space allocation and design, while real-time measurements would provide drivers and parking enforcement with information that saves time and resources. In this paper, we propose an accurate and real-time video system for future Internet of Things (IoT) and smart cities applications. Using recent developments in deep convolutional neural networks (DCNNs) and a novel vehicle tracking filter, we combine information across multiple image frames in a video sequence to remove noise introduced by occlusions and detection failures. We demonstrate that our system achieves higher accuracy than pure image-based instance segmentation, and is comparable in performance to industry benchmark systems that utilize more expensive sensors such as radar. Furthermore, our system shows significant potential in its scalability to a city-wide scale and also in the richness of its output that goes beyond traditional binary occupancy statistics.