Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ryan J. Schlueter

Large-Scale Object Detection of Images from Network Cameras in Variable Ambient Lighting Conditions

Dec 31, 2018

Caleb Tung, Matthew R. Kelleher, Ryan J. Schlueter, Binhan Xu, Yung-Hsiang Lu, George K. Thiruvathukal, Yen-Kuang Chen, Yang Lu

Figure 1 for Large-Scale Object Detection of Images from Network Cameras in Variable Ambient Lighting Conditions

Figure 2 for Large-Scale Object Detection of Images from Network Cameras in Variable Ambient Lighting Conditions

Figure 3 for Large-Scale Object Detection of Images from Network Cameras in Variable Ambient Lighting Conditions

Figure 4 for Large-Scale Object Detection of Images from Network Cameras in Variable Ambient Lighting Conditions

Abstract:Computer vision relies on labeled datasets for training and evaluation in detecting and recognizing objects. The popular computer vision program, YOLO ("You Only Look Once"), has been shown to accurately detect objects in many major image datasets. However, the images found in those datasets, are independent of one another and cannot be used to test YOLO's consistency at detecting the same object as its environment (e.g. ambient lighting) changes. This paper describes a novel effort to evaluate YOLO's consistency for large-scale applications. It does so by working (a) at large scale and (b) by using consecutive images from a curated network of public video cameras deployed in a variety of real-world situations, including traffic intersections, national parks, shopping malls, university campuses, etc. We specifically examine YOLO's ability to detect objects in different scenarios (e.g., daytime vs. night), leveraging the cameras' ability to rapidly retrieve many successive images for evaluating detection consistency. Using our camera network and advanced computing resources (supercomputers), we analyzed more than 5 million images captured by 140 network cameras in 24 hours. Compared with labels marked by humans (considered as "ground truth"), YOLO struggles to consistently detect the same humans and cars as their positions change from one frame to the next; it also struggles to detect objects at night time. Our findings suggest that state-of-the art vision solutions should be trained by data from network camera with contextual information before they can be deployed in applications that demand high consistency on object detection.

* Submitted to MIPR 2019 (Accepted)

Via

Access Paper or Ask Questions