With the proliferation of video data in smart city applications like intelligent transportation, efficient video analytics has become crucial but also challenging. This paper proposes a semantics-driven cloud-edge collaborative approach for accelerating video inference, using license plate recognition as a case study. The method separates semantics extraction and recognition, allowing edge servers to only extract visual semantics (license plate patches) from video frames and offload computation-intensive recognition to the cloud or neighboring edges based on load. This segmented processing coupled with a load-aware work distribution strategy aims to reduce end-to-end latency and improve throughput. Experiments demonstrate significant improvements in end-to-end inference speed (up to 5x faster), throughput (up to 9 FPS), and reduced traffic volumes (50% less) compared to cloud-only or edge-only processing, validating the efficiency of the proposed approach. The cloud-edge collaborative framework with semantics-driven work partitioning provides a promising solution for scaling video analytics in smart cities.