While the first CCTV camera was developed almost a century ago back in 1927, currently it is assumed as granted there are about 770 millions CCTV cameras around the globe, and their number is casually predicted to surpass 1 billion in 2021. At the same time the increasing, widespread, unwarranted, and unaccountable use of CCTV cameras globally raises privacy risks and concerns for the last several decades. Recent technological advances implemented in CCTV cameras such as AI-based facial recognition and IoT connectivity only fuel further these concerns raised by privacy-minded persons. However, many of the debates, reports, and policies are based on assumptions and numbers that are neither necessarily factually accurate nor are based on sound methodologies. For example, at present there is no accurate and global understanding of how many CCTV cameras are deployed and in use, where are those cameras located, who owns or operates those cameras, etc. In addition, there are no proper (i.e., sound, accurate, advanced) tools that can help us achieve such counting, localization, and other information gathering. Therefore, new methods and tools must be developed in order to detect, count, and localize the CCTV cameras. To close this gap, with this paper we introduce the first and only computer vision MS COCO-compatible models that are able to accurately detect CCTV and video surveillance cameras in images and video frames. To this end, our state-of-the-art detector was built using 3401 images that were manually reviewed and annotated, and achieves an accuracy between 91,1% - 95,6%. Moreover, we build and evaluate multiple models, present a comprehensive comparison of their performance, and outline core challenges associated with such research. We also present possible privacy-, safety-, and security-related practical applications of our core work.